You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by marotosg <ma...@gmail.com> on 2017/04/01 22:17:42 UTC

DataImportHandler OutOfMemory Mysql

Hi,

I am trying to load a big table into Solr using DataImportHandler and Mysql. 
I am getting OutOfMemory error because Solr is trying to load the full
table. I have been reading different posts and tried batchSize="-1". 
https://wiki.apache.org/solr/DataImportHandlerFaq

Do you have any idea what could be the issue?
Completely lost here.

Solr.6.4.1
mysql-connector-java-5.1.41-bin.jar

data-config 

<dataSource type="JdbcDataSource" 
            driver="com.mysql.jdbc.Driver"
            url="jdbc:mysql://188.68.190.85:3306/jobsdb" 
            user="suer" 
            password="passowrd"/>
<document>
  <entity name="jobsearch"  
    pk="id"
	batchSize="-1"
    query="select * from job"
    deltaImportQuery="SELECT * from job WHERE id='${dih.delta.id}'"
    deltaQuery="SELECT id FROM job  WHERE updated_at >
'${dih.last_index_time}'"
    >
     <field column="job_id" name="JobID"/>
     <field column="position" name="Position"/>    
     <field column="employment_type" name="EmploymentType"/>    
	 <field column="description" name="Description"/>
     <field column="category" name="Category"/>  
	 <field column="apply_url" name="ApplyUrl"/>
     <field column="description_url" name="DescriptionUrl"/>  	
	 <field column="company" name="Company"/>  	
	 <field column="city" name="City"/>  	
	 <field column="country_subdivision1" name="CountrySubdivision1"/>  	
	 <field column="country_subdivision2" name="CountrySubdivision2"/>  	
	 <field column="country" name="Country"/>  	
     <field column="source" name="Source"/>
	 <field column="created_at" name="CreatedAt"/>
	 <field column="updated_at" name="UpdatedAt"/>	  
  </entity>
</document>
</dataConfig>

Thanks
Sergio



--
View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-OutOfMemory-Mysql-tp4327982.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler OutOfMemory Mysql

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/1/2017 4:17 PM, marotosg wrote:
> I am trying to load a big table into Solr using DataImportHandler and Mysql. 
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1". 
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> Do you have any idea what could be the issue?
> Completely lost here.
>
> Solr.6.4.1
> mysql-connector-java-5.1.41-bin.jar
>
> data-config 
>
> <dataSource type="JdbcDataSource" 
>             driver="com.mysql.jdbc.Driver"
>             url="jdbc:mysql://188.68.190.85:3306/jobsdb" 
>             user="suer" 
>             password="passowrd"/>
> <document>
>   <entity name="jobsearch"  
>     pk="id"
> 	batchSize="-1"

Setting batchSize to -1 is the proper solution, but you've got it in the
wrong place.  It goes on dataSource, not on entity.

https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F

When batchSize is -1, DIH executes setFetchSize(Integer.MIN_VALUE) on
the JDBC statement.  This causes the MySQL JDBC driver to stream the
results instead of buffering them.

You should upgrade to 6.4.2 or 6.5.0.  6.4.0 and 6.4.1 have a serious
performance bug.

https://issues.apache.org/jira/browse/SOLR-10130

You may also want to edit the maxMergeCount setting on the
mergeScheduler config, set it to at least 6.  I ran into a problem with
the database disconnecting while importing millions of rows with DIH
from MySQL; this was the solution.  See this thread:

http://lucene.472066.n3.nabble.com/Closed-connection-issue-while-doing-dataimport-td4327116.html

Thanks,
Shawn


Re: DataImportHandler OutOfMemory Mysql

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Sergio.

Have you tried Integer.MIN_VALUE ? -2147483648 see
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html


On Sun, Apr 2, 2017 at 1:17 AM, marotosg <ma...@gmail.com> wrote:

> Hi,
>
> I am trying to load a big table into Solr using DataImportHandler and
> Mysql.
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1".
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> Do you have any idea what could be the issue?
> Completely lost here.
>
> Solr.6.4.1
> mysql-connector-java-5.1.41-bin.jar
>
> data-config
>
> <dataSource type="JdbcDataSource"
>             driver="com.mysql.jdbc.Driver"
>             url="jdbc:mysql://188.68.190.85:3306/jobsdb"
>             user="suer"
>             password="passowrd"/>
> <document>
>   <entity name="jobsearch"
>     pk="id"
>         batchSize="-1"
>     query="select * from job"
>     deltaImportQuery="SELECT * from job WHERE id='${dih.delta.id}'"
>     deltaQuery="SELECT id FROM job  WHERE updated_at >
> '${dih.last_index_time}'"
>     >
>      <field column="job_id" name="JobID"/>
>      <field column="position" name="Position"/>
>      <field column="employment_type" name="EmploymentType"/>
>          <field column="description" name="Description"/>
>      <field column="category" name="Category"/>
>          <field column="apply_url" name="ApplyUrl"/>
>      <field column="description_url" name="DescriptionUrl"/>
>          <field column="company" name="Company"/>
>          <field column="city" name="City"/>
>          <field column="country_subdivision1" name="CountrySubdivision1"/>
>          <field column="country_subdivision2" name="CountrySubdivision2"/>
>          <field column="country" name="Country"/>
>      <field column="source" name="Source"/>
>          <field column="created_at" name="CreatedAt"/>
>          <field column="updated_at" name="UpdatedAt"/>
>   </entity>
> </document>
> </dataConfig>
>
> Thanks
> Sergio
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/DataImportHandler-OutOfMemory-Mysql-tp4327982.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev