You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by deniz <de...@gmail.com> on 2018/09/06 07:52:11 UTC

Concurrent Update Client Stops on Exceptions Randomly v7.4

I am trying to write a wrapper for DIH, so i can leverage the field type
guessing while importing the sql data. 

the query is supposed to retrieve 400K+ documents. in the test data in db,
there are dirty date fields, which has data like '1966-00-00' or
'1987-10-00' as well. 

I am running the code below:

 public void dataimport(ConcurrentUpdateSolrClient updateClient, String
importSql) {

        try {
            
            Connection conn = DriverManager.getConnection("connection
string","user","pass");
            Statement stmt =
conn.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
            stmt.setFetchSize(Integer.MIN_VALUE);
            ResultSet rs = stmt.executeQuery(importsql);
            ResultSetMetaData resultSetMetaData = rs.getMetaData();
            List<SolrFieldObject> fields = new ArrayList<>();
            for(int index=1; index < resultSetMetaData.getColumnCount();
index++){
                fields.add(new
SolrFieldObject(resultSetMetaData.getColumnLabel(index),
resultSetMetaData.getColumnClassName(index)));
            }
            while(rs.next()){
                SolrInputDocument solrInputDocument = new
SolrInputDocument();
                for(SolrFieldObject field : fields){
                    try{
                        Object dataObject = rs.getString(field.name());
                        Optional.ofNullable(dataObject).ifPresent(
                                databaseInfo ->{
                                solrInputDocument.addField(field.name(),
String.valueOf(databaseInfo)); 
                                }
                        );
                    }catch(Exception e){
                        e.printStackTrace();
                    }

                }
                try{
                     UpdateRequest updateRequest = new UpdateRequest();
                     updateRequest.setCommitWithin(10000);
                    try{
                      updateRequest.add(solrInputDocument);
                      updateRequest.process(updateClient);

                    }catch(Exception e){
                      e.printStackTrace();
                    }
                }catch(Exception e){
                    System.out.println("Inner -> " + e.getMessage());
                }
            }
            stmt.close();
            conn.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

The code is working fine, except that it is randomly stopping with the logs
like 'Error adding field 'day'='1976-00-00' msg=Invalid Date
String:'1976-00-00' on random documents. Although there are many other
documents with invalid dates, those are logged as errors on the server side,
but client works fine and continues to push other document, until it stops
on random document with the given error.

Are there any error threshold value that makes the concurrent update client
stop after some time? or there are some other points I am missing while
dealing with this kind of updates? 



-----
Zeki ama calismiyor... Calissa yapar...
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Concurrent Update Client Stops on Exceptions Randomly v7.4

Posted by Erick Erickson <er...@gmail.com>.
I would seriously consider moving away from DIH to SolrJ if you want
to tweak on this level, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

One other alternative is to incorporate a ScriptUpdateProcessor in
your update chain to intercept these on the way in to being indexed
and "do something" to fix it up.

ConcurrentUpdateSolrServer shouldn't "just quit", I'd guess something in DIH.

Best,
Erick
On Thu, Sep 6, 2018 at 12:52 AM deniz <de...@gmail.com> wrote:
>
> I am trying to write a wrapper for DIH, so i can leverage the field type
> guessing while importing the sql data.
>
> the query is supposed to retrieve 400K+ documents. in the test data in db,
> there are dirty date fields, which has data like '1966-00-00' or
> '1987-10-00' as well.
>
> I am running the code below:
>
>  public void dataimport(ConcurrentUpdateSolrClient updateClient, String
> importSql) {
>
>         try {
>
>             Connection conn = DriverManager.getConnection("connection
> string","user","pass");
>             Statement stmt =
> conn.createStatement(ResultSet.TYPE_FORWARD_ONLY,
> ResultSet.CONCUR_READ_ONLY);
>             stmt.setFetchSize(Integer.MIN_VALUE);
>             ResultSet rs = stmt.executeQuery(importsql);
>             ResultSetMetaData resultSetMetaData = rs.getMetaData();
>             List<SolrFieldObject> fields = new ArrayList<>();
>             for(int index=1; index < resultSetMetaData.getColumnCount();
> index++){
>                 fields.add(new
> SolrFieldObject(resultSetMetaData.getColumnLabel(index),
> resultSetMetaData.getColumnClassName(index)));
>             }
>             while(rs.next()){
>                 SolrInputDocument solrInputDocument = new
> SolrInputDocument();
>                 for(SolrFieldObject field : fields){
>                     try{
>                         Object dataObject = rs.getString(field.name());
>                         Optional.ofNullable(dataObject).ifPresent(
>                                 databaseInfo ->{
>                                 solrInputDocument.addField(field.name(),
> String.valueOf(databaseInfo));
>                                 }
>                         );
>                     }catch(Exception e){
>                         e.printStackTrace();
>                     }
>
>                 }
>                 try{
>                      UpdateRequest updateRequest = new UpdateRequest();
>                      updateRequest.setCommitWithin(10000);
>                     try{
>                       updateRequest.add(solrInputDocument);
>                       updateRequest.process(updateClient);
>
>                     }catch(Exception e){
>                       e.printStackTrace();
>                     }
>                 }catch(Exception e){
>                     System.out.println("Inner -> " + e.getMessage());
>                 }
>             }
>             stmt.close();
>             conn.close();
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
> The code is working fine, except that it is randomly stopping with the logs
> like 'Error adding field 'day'='1976-00-00' msg=Invalid Date
> String:'1976-00-00' on random documents. Although there are many other
> documents with invalid dates, those are logged as errors on the server side,
> but client works fine and continues to push other document, until it stops
> on random document with the given error.
>
> Are there any error threshold value that makes the concurrent update client
> stop after some time? or there are some other points I am missing while
> dealing with this kind of updates?
>
>
>
> -----
> Zeki ama calismiyor... Calissa yapar...
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html