You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sc...@asia.com on 2010/07/31 09:29:45 UTC

DIH: Rows fetch OK, Total Documents Failed??

 Hi,

I'm a bit lost with this, i'm trying to import a new XML via DIH, all row are fetched but no ducument are indexed? I don't find any log or error?

Any ideas?

Here is the STATUS:

 
<str name="command">status</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Total Requests made to DataSource">1</str>
<str name="Total Rows Fetched">7554</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2010-07-31 10:14:33</str>
<str name="Total Documents Processed">0</str>
<str name="Total Documents Failed">7554</str>
<str name="Time taken ">0:0:4.720</str>
</lst>


My xml file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<products>
    <product>
        <title>Moniteur VG1930wm 19 LCD Viewsonic</title>
        <url>http://xxxxx.com/abc?a(12073231)p(2822679)prod(89042332277)ttid(5)url(http%3A%2F%2Fwww.ffdsssd.com%2Fproductinformation%2F%7E66297%7E%2Fproduct.htm%26sender%3D2003)</url>
        <content>Moniteur VG1930wm 19  LCD Viewsonic VG1930WM</content>
        <price>247.57</price>
        <category>Ecrans</category>
    </product
etc...

and my dataconfig:

<dataConfig>
        <dataSource type="URLDataSource" />
        <document>
                <entity name="products"
                        url="file:///home/john/Desktop/src.xml"
                        processor="XPathEntityProcessor"
                        forEach="/products/product"
                        transformer="DateFormatTransformer">

                        <field column="id"      xpath="/products/product/url"   commonField="true" />
                        <field column="title"   xpath="/products/product/title" commonField="true" />
                        <field column="category"  xpath="/products/product/category" />
                        <field column="content"  xpath="/products/product/content" />
                        <field column="price"      xpath="/products/product/price" />
     
                </entity>
        </document>
</dataConfig>





RE: Rows fetch OK, Total Documents Failed??

Posted by Michael Griffiths <mg...@am-ind.com>.
Check your schema.xml; one of the fields is probable "Required," and it's not matching to a field extracted from DIH. Keep in mind that schema.xml is case-sensitive for names.

-----Original Message-----
From: scrapy@asia.com [mailto:scrapy@asia.com] 
Sent: Saturday, July 31, 2010 3:30 AM
To: solr-user@lucene.apache.org
Subject: DIH: Rows fetch OK, Total Documents Failed??


 Hi,

I'm a bit lost with this, i'm trying to import a new XML via DIH, all row are fetched but no ducument are indexed? I don't find any log or error?

Any ideas?

Here is the STATUS:

 
<str name="command">status</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Total Requests made to DataSource">1</str> <str name="Total Rows Fetched">7554</str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started">2010-07-31 10:14:33</str> <str name="Total Documents Processed">0</str> <str name="Total Documents Failed">7554</str> <str name="Time taken ">0:0:4.720</str> </lst>


My xml file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<products>
    <product>
        <title>Moniteur VG1930wm 19 LCD Viewsonic</title>
        <url>http://xxxxx.com/abc?a(12073231)p(2822679)prod(89042332277)ttid(5)url(http%3A%2F%2Fwww.ffdsssd.com%2Fproductinformation%2F%7E66297%7E%2Fproduct.htm%26sender%3D2003)</url>
        <content>Moniteur VG1930wm 19  LCD Viewsonic VG1930WM</content>
        <price>247.57</price>
        <category>Ecrans</category>
    </product
etc...

and my dataconfig:

<dataConfig>
        <dataSource type="URLDataSource" />
        <document>
                <entity name="products"
                        url="file:///home/john/Desktop/src.xml"
                        processor="XPathEntityProcessor"
                        forEach="/products/product"
                        transformer="DateFormatTransformer">

                        <field column="id"      xpath="/products/product/url"   commonField="true" />
                        <field column="title"   xpath="/products/product/title" commonField="true" />
                        <field column="category"  xpath="/products/product/category" />
                        <field column="content"  xpath="/products/product/content" />
                        <field column="price"      xpath="/products/product/price" />
     
                </entity>
        </document>
</dataConfig>





Re: DIH: Rows fetch OK, Total Documents Failed??

Posted by Alexey Serba <as...@gmail.com>.
Do you have any required fields or uniqueKey in your schema.xml? Do
you provide values for all these fields?

AFAIU you don't need commonField attribute for id and title fields. I
don't think that's your problem but anyway...


On Sat, Jul 31, 2010 at 11:29 AM,  <sc...@asia.com> wrote:
>
>  Hi,
>
> I'm a bit lost with this, i'm trying to import a new XML via DIH, all row are fetched but no ducument are indexed? I don't find any log or error?
>
> Any ideas?
>
> Here is the STATUS:
>
>
> <str name="command">status</str>
> <str name="status">idle</str>
> <str name="importResponse"/>
> <lst name="statusMessages">
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">7554</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2010-07-31 10:14:33</str>
> <str name="Total Documents Processed">0</str>
> <str name="Total Documents Failed">7554</str>
> <str name="Time taken ">0:0:4.720</str>
> </lst>
>
>
> My xml file looks like this:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <products>
>    <product>
>        <title>Moniteur VG1930wm 19 LCD Viewsonic</title>
>        <url>http://xxxxx.com/abc?a(12073231)p(2822679)prod(89042332277)ttid(5)url(http%3A%2F%2Fwww.ffdsssd.com%2Fproductinformation%2F%7E66297%7E%2Fproduct.htm%26sender%3D2003)</url>
>        <content>Moniteur VG1930wm 19  LCD Viewsonic VG1930WM</content>
>        <price>247.57</price>
>        <category>Ecrans</category>
>    </product
> etc...
>
> and my dataconfig:
>
> <dataConfig>
>        <dataSource type="URLDataSource" />
>        <document>
>                <entity name="products"
>                        url="file:///home/john/Desktop/src.xml"
>                        processor="XPathEntityProcessor"
>                        forEach="/products/product"
>                        transformer="DateFormatTransformer">
>
>                        <field column="id"      xpath="/products/product/url"   commonField="true" />
>                        <field column="title"   xpath="/products/product/title" commonField="true" />
>                        <field column="category"  xpath="/products/product/category" />
>                        <field column="content"  xpath="/products/product/content" />
>                        <field column="price"      xpath="/products/product/price" />
>
>                </entity>
>        </document>
> </dataConfig>
>
>
>
>
>