You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by marotosg <ma...@gmail.com> on 2017/04/01 22:17:42 UTC
DataImportHandler OutOfMemory Mysql
Hi,
I am trying to load a big table into Solr using DataImportHandler and Mysql.
I am getting OutOfMemory error because Solr is trying to load the full
table. I have been reading different posts and tried batchSize="-1".
https://wiki.apache.org/solr/DataImportHandlerFaq
Do you have any idea what could be the issue?
Completely lost here.
Solr.6.4.1
mysql-connector-java-5.1.41-bin.jar
data-config
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://188.68.190.85:3306/jobsdb"
user="suer"
password="passowrd"/>
<document>
<entity name="jobsearch"
pk="id"
batchSize="-1"
query="select * from job"
deltaImportQuery="SELECT * from job WHERE id='${dih.delta.id}'"
deltaQuery="SELECT id FROM job WHERE updated_at >
'${dih.last_index_time}'"
>
<field column="job_id" name="JobID"/>
<field column="position" name="Position"/>
<field column="employment_type" name="EmploymentType"/>
<field column="description" name="Description"/>
<field column="category" name="Category"/>
<field column="apply_url" name="ApplyUrl"/>
<field column="description_url" name="DescriptionUrl"/>
<field column="company" name="Company"/>
<field column="city" name="City"/>
<field column="country_subdivision1" name="CountrySubdivision1"/>
<field column="country_subdivision2" name="CountrySubdivision2"/>
<field column="country" name="Country"/>
<field column="source" name="Source"/>
<field column="created_at" name="CreatedAt"/>
<field column="updated_at" name="UpdatedAt"/>
</entity>
</document>
</dataConfig>
Thanks
Sergio
--
View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-OutOfMemory-Mysql-tp4327982.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler OutOfMemory Mysql
Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/1/2017 4:17 PM, marotosg wrote:
> I am trying to load a big table into Solr using DataImportHandler and Mysql.
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1".
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> Do you have any idea what could be the issue?
> Completely lost here.
>
> Solr.6.4.1
> mysql-connector-java-5.1.41-bin.jar
>
> data-config
>
> <dataSource type="JdbcDataSource"
> driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://188.68.190.85:3306/jobsdb"
> user="suer"
> password="passowrd"/>
> <document>
> <entity name="jobsearch"
> pk="id"
> batchSize="-1"
Setting batchSize to -1 is the proper solution, but you've got it in the
wrong place. It goes on dataSource, not on entity.
https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F
When batchSize is -1, DIH executes setFetchSize(Integer.MIN_VALUE) on
the JDBC statement. This causes the MySQL JDBC driver to stream the
results instead of buffering them.
You should upgrade to 6.4.2 or 6.5.0. 6.4.0 and 6.4.1 have a serious
performance bug.
https://issues.apache.org/jira/browse/SOLR-10130
You may also want to edit the maxMergeCount setting on the
mergeScheduler config, set it to at least 6. I ran into a problem with
the database disconnecting while importing millions of rows with DIH
from MySQL; this was the solution. See this thread:
http://lucene.472066.n3.nabble.com/Closed-connection-issue-while-doing-dataimport-td4327116.html
Thanks,
Shawn
Re: DataImportHandler OutOfMemory Mysql
Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Sergio.
Have you tried Integer.MIN_VALUE ? -2147483648 see
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
On Sun, Apr 2, 2017 at 1:17 AM, marotosg <ma...@gmail.com> wrote:
> Hi,
>
> I am trying to load a big table into Solr using DataImportHandler and
> Mysql.
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1".
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> Do you have any idea what could be the issue?
> Completely lost here.
>
> Solr.6.4.1
> mysql-connector-java-5.1.41-bin.jar
>
> data-config
>
> <dataSource type="JdbcDataSource"
> driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://188.68.190.85:3306/jobsdb"
> user="suer"
> password="passowrd"/>
> <document>
> <entity name="jobsearch"
> pk="id"
> batchSize="-1"
> query="select * from job"
> deltaImportQuery="SELECT * from job WHERE id='${dih.delta.id}'"
> deltaQuery="SELECT id FROM job WHERE updated_at >
> '${dih.last_index_time}'"
> >
> <field column="job_id" name="JobID"/>
> <field column="position" name="Position"/>
> <field column="employment_type" name="EmploymentType"/>
> <field column="description" name="Description"/>
> <field column="category" name="Category"/>
> <field column="apply_url" name="ApplyUrl"/>
> <field column="description_url" name="DescriptionUrl"/>
> <field column="company" name="Company"/>
> <field column="city" name="City"/>
> <field column="country_subdivision1" name="CountrySubdivision1"/>
> <field column="country_subdivision2" name="CountrySubdivision2"/>
> <field column="country" name="Country"/>
> <field column="source" name="Source"/>
> <field column="created_at" name="CreatedAt"/>
> <field column="updated_at" name="UpdatedAt"/>
> </entity>
> </document>
> </dataConfig>
>
> Thanks
> Sergio
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/DataImportHandler-OutOfMemory-Mysql-tp4327982.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
--
Sincerely yours
Mikhail Khludnev