You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Bradley <Da...@adfero.co.uk> on 2009/09/22 19:23:40 UTC

Oracle incomplete DataImport results

I appear to be getting only a small number of items imported into Solr
when doing a full-import against an oracle data-provider. The query I'm
running is something approximately similar to:

SELECT "ID", dbms_lob.substr("Text", 4000, 1) "Text", "Date",
"LastModified", "Type", "Created", "Available", "Parent", "Title" from
"TheTableName" where "Available" < CURRENT_DATE and "Available" >
add_months(current_date, -1)

This retrieves the last month's items from the database (The
dbms_lob.substr function is used to avoid Solr simply indexing the
object name as Text is the Oracle clob type). When running this in
oracle sql developer approximately 5600 rows are returned however
running a full import only imports approximately 550 items. 

There's no visible memory use and no exceptions suggesting any problems
with lack of memory. Is there any limiting of the number of items you
can import in a single request? Any other thoughts on this problem would
be much appreciated.

Thanks



Other Information:

Running the command:
http://xxx.xxx.xxx.xxx:8080/solr/dataimport?command=full-import

Produces the output:
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">0</int>
  </lst>
  <lst name="initArgs">
    <lst name="defaults">
      <str name="config">data-config.xml</str>
    </lst>
  </lst>
  <str name="command">full-import</str>
  <str name="status">idle</str>
  <str name="importResponse"/>
  <lst name="statusMessages">
    <str name="Time Elapsed">0:5:43.58</str>
    <str name="Total Requests made to DataSource">559</str>
    <str name="Total Rows Fetched">4726</str>
    <str name="Total Documents Processed">557</str>
    <str name="Total Documents Skipped">0</str>
    <str name="Full Dump Started">2009-09-22 16:58:46</str>
  </lst>
  <str name="WARNING">This response format is experimental.  It is
likely to change in the future.</str>
</response>

Running the command:
http://xxx.xxx.xxx.xxx:8080/solr/dataimport?command=full-import&debug=on
&verbose=true

Produces the following output (dots added where content is not
relevant):

<?xml version="1.0" encoding="utf-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">40906</int>
  </lst>
  <lst name="initArgs">
    <lst name="defaults">
      <str name="config">data-config.xml</str>
    </lst>
  </lst>
  <str name="command">full-import</str>
  <str name="mode">debug</str>
  <arr name="documents">
    ...
  </arr>
  <lst name="verbose-output">
    <lst name="entity:article">
      <lst name="document#1">
        <str name="query">SELECT "ID", dbms_lob.substr("Text", 4000, 1)
"Text", "Date", "LastModified", "Type", "Created", "Available",
"Parent", "Title" from "TheTableName" where "Available" &lt;
CURRENT_DATE and "Available" &gt; add_months(current_date, -1)</str>
        <str name="time-taken">0:0:7.766</str>
        <str>----------- row #1-------------</str>
        <date name="Available">2009-08-22T16:04:04Z</date>
        <str name="Parent">java.math.BigDecimal:0</str>
        <str name="Text">
          ...
        </str>
        <date name="Created">2009-08-22T16:04:04Z</date>
        <date name="Date">2009-08-22T16:04:04Z</date>
        <str name="Type">java.math.BigDecimal:235</str>
        <str name="ID">java.math.BigDecimal:1320541</str>
        <date name="LastModified">2009-08-22T16:04:58Z</date>
        <str name="Title">...</str>
        <str>---------------------------------------------</str>
        <lst name="entity:codes">
          <str name="query">SELECT
CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT("Level1",' '),"Level2"), ' '),
"Level3"), ' '), "Level4") "Levels", TO_NCHAR("TheCategories"."Value")
"Value" FROM "TheCategories" WHERE "TheCategories".ID='1320541'</str>
          <str name="time-taken">0:0:5.485</str>
          <str>----------- row #1-------------</str>
          <str name="Levels">12 235 1848 </str>
          <null name="Value"/>
          <str>---------------------------------------------</str>
          ...
        </lst>
      </lst>
      ...
    </lst>
  </lst>
  <str name="status">idle</str>
  <str name="importResponse">Configuration Re-loaded sucessfully</str>
  <lst name="statusMessages">
    <str name="Total Requests made to DataSource">11</str>
    <str name="Total Rows Fetched">93</str>
    <str name="Total Documents Skipped">0</str>
    <str name="Full Dump Started">2009-09-22 16:47:28</str>
    <str name="Time taken ">0:0:39.47</str>
  </lst>
  <str name="WARNING">This response format is experimental.  It is
likely to change in the future.</str>
</response>



This message has been scanned for viruses by Websense Hosted Email Security - On Behalf of Adfero Ltd

DISCLAIMER: This email (including any attachments) is subject to copyright, and the information in it is confidential. Use of this email or of any information in it other than by the addressee is unauthorised and unlawful. Whilst reasonable efforts are made to ensure that any attachments are virus-free, it is the recipient's sole responsibility to scan all attachments for viruses.  All calls and emails to and from this company may be monitored and recorded for legitimate purposes relating to this company's business.

Any opinions expressed in this email (or in any attachments) are those of the author and do not necessarily represent the opinions of Adfero Ltd or of any other group company.

Re: Oracle incomplete DataImport results

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Sep 23, 2009 at 3:53 PM, Daniel Bradley <Daniel.Bradley@adfero.co.uk
> wrote:

> After investigating the log files, the DataImporter was throwing an error
> from the Oracle DB driver:
>
> java.sql.SQLException: ORA-22835: Buffer too small for CLOB to CHAR or BLOB
> to RAW conversion (actual: 2890, maximum: 2000)
>
> Aka. There was a problem with the 551st item where a related item had a
> text field of type Clob that was too long and was therefore causing a
> problem when using the function TO_NCHAR to fix the type.
>
> FIX:
> Used the Oracle function dbms_lob.substr("FIELD_NAME", MAX_LENGTH, 1) to
> just trim the string (this also applies and implicit converstion).
>
>
Phew, tricky one! Thanks for bringing closure.

-- 
Regards,
Shalin Shekhar Mangar.

RE: Oracle incomplete DataImport results

Posted by Daniel Bradley <Da...@Adfero.co.uk>.
After investigating the log files, the DataImporter was throwing an error from the Oracle DB driver:

java.sql.SQLException: ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 2890, maximum: 2000)

Aka. There was a problem with the 551st item where a related item had a text field of type Clob that was too long and was therefore causing a problem when using the function TO_NCHAR to fix the type. 

FIX:
Used the Oracle function dbms_lob.substr("FIELD_NAME", MAX_LENGTH, 1) to just trim the string (this also applies and implicit converstion).

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: 22 September 2009 19:27
To: solr-user@lucene.apache.org
Subject: Re: Oracle incomplete DataImport results

On Tue, Sep 22, 2009 at 10:53 PM, Daniel Bradley <
Daniel.Bradley@adfero.co.uk> wrote:

> I appear to be getting only a small number of items imported into Solr
> when doing a full-import against an oracle data-provider. The query I'm
> running is something approximately similar to:
>
> SELECT "ID", dbms_lob.substr("Text", 4000, 1) "Text", "Date",
> "LastModified", "Type", "Created", "Available", "Parent", "Title" from
> "TheTableName" where "Available" < CURRENT_DATE and "Available" >
> add_months(current_date, -1)
>
> This retrieves the last month's items from the database (The
> dbms_lob.substr function is used to avoid Solr simply indexing the
> object name as Text is the Oracle clob type). When running this in
> oracle sql developer approximately 5600 rows are returned however
> running a full import only imports approximately 550 items.
>
> There's no visible memory use and no exceptions suggesting any problems
> with lack of memory. Is there any limiting of the number of items you
> can import in a single request? Any other thoughts on this problem would
> be much appreciated.
>
>
What is the uniqueKey in schema.xml? Is it possible that many of those 5600
rows share the same value for solr's uniqueKey field?

There are no limits on the number of items you can import. The number of
documents created should correspond to the number of rows returned by the
root level entity's query (assuming the uniqueKey for each of those
documents is actually unique).

-- 
Regards,
Shalin Shekhar Mangar.


This message has been scanned for viruses by Websense Hosted Email Security - On Behalf of Adfero Ltd

DISCLAIMER: This email (including any attachments) is subject to copyright, and the information in it is confidential. Use of this email or of any information in it other than by the addressee is unauthorised and unlawful. Whilst reasonable efforts are made to ensure that any attachments are virus-free, it is the recipient's sole responsibility to scan all attachments for viruses.  All calls and emails to and from this company may be monitored and recorded for legitimate purposes relating to this company's business.

Any opinions expressed in this email (or in any attachments) are those of the author and do not necessarily represent the opinions of Adfero Ltd or of any other group company.

Re: Oracle incomplete DataImport results

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Sep 22, 2009 at 10:53 PM, Daniel Bradley <
Daniel.Bradley@adfero.co.uk> wrote:

> I appear to be getting only a small number of items imported into Solr
> when doing a full-import against an oracle data-provider. The query I'm
> running is something approximately similar to:
>
> SELECT "ID", dbms_lob.substr("Text", 4000, 1) "Text", "Date",
> "LastModified", "Type", "Created", "Available", "Parent", "Title" from
> "TheTableName" where "Available" < CURRENT_DATE and "Available" >
> add_months(current_date, -1)
>
> This retrieves the last month's items from the database (The
> dbms_lob.substr function is used to avoid Solr simply indexing the
> object name as Text is the Oracle clob type). When running this in
> oracle sql developer approximately 5600 rows are returned however
> running a full import only imports approximately 550 items.
>
> There's no visible memory use and no exceptions suggesting any problems
> with lack of memory. Is there any limiting of the number of items you
> can import in a single request? Any other thoughts on this problem would
> be much appreciated.
>
>
What is the uniqueKey in schema.xml? Is it possible that many of those 5600
rows share the same value for solr's uniqueKey field?

There are no limits on the number of items you can import. The number of
documents created should correspond to the number of rows returned by the
root level entity's query (assuming the uniqueKey for each of those
documents is actually unique).

-- 
Regards,
Shalin Shekhar Mangar.