You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by PEEYUSH CHANDEL <cp...@gmail.com> on 2011/01/12 22:00:29 UTC

Connecting MySQL to Apache Nutch

I am using Apache Nutch first time. How can I store data into a MySQL
database after crawling?

So that i can  be able to easily use the data in other web applications.

I found a question related to this here

(http://stackoverflow.com/questions/3227259/nutch-mysql-integration),

but I don't clearly understand which part of the code is gona replace
by MySQL connector. Please help with a short code example.

-thanks in advance

Re: Connecting MySQL to Apache Nutch

Posted by Iker Huerga <ik...@gmail.com>.
Hi,

I did it following this tutorial at [1] but using nutch 2.0. It uses Apache
Gora as the ORM for data persistance.

I will try to follow this procedure to do the same in nutch 1.2, i will let
you know

[1]  http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

2011/1/13 Markus Jelsma <ma...@openindex.io>

> Try using the logger, this way you can check hadoop.log for your output.
>
> import:
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
>
> declare:
> public static Log LOG = LogFactory.getLog(SolrWriter.class);
>
> use:
> LOG.info("bla bla");
>
>
>
>
>
> On Thursday 13 January 2011 14:28:21 PEEYUSH CHANDEL wrote:
> > hi markus
> >
> > here is my modified SolarWriter class,please check it and correct me
> > if i am doing something wrong.
> >
> > i tried this code but nothing happens.
> >
> > package org.apache.nutch.indexer.solr;
> >
> > import java.io.IOException;
> > import java.util.ArrayList;
> > import java.util.List;
> > import java.util.Map.Entry;
> > import java.util.Iterator;
> > import java.sql.*;
> >
> > import org.apache.hadoop.mapred.JobConf;
> > import org.apache.nutch.indexer.NutchDocument;
> > import org.apache.nutch.indexer.NutchField;
> > import org.apache.nutch.indexer.NutchIndexWriter;
> > import org.apache.solr.client.solrj.SolrServer;
> > import org.apache.solr.client.solrj.SolrServerException;
> > import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> > import org.apache.solr.common.SolrInputDocument;
> >
> > public class SolrWriter implements NutchIndexWriter {
> >
> >   private SolrServer solr;
> >   private SolrMappingReader solrMapping;
> >
> >   private final List<SolrInputDocument> inputDocs =
> >     new ArrayList<SolrInputDocument>();
> >
> >   private int commitSize;
> >
> >   public void open(JobConf job, String name) throws IOException {
> >     solr = new CommonsHttpSolrServer(job.get(SolrConstants.SERVER_URL));
> >     commitSize = job.getInt(SolrConstants.COMMIT_SIZE, 1000);
> >     solrMapping = SolrMappingReader.getInstance(job);
> >   }
> >
> >   public void write(NutchDocument doc) throws IOException {
> >     final SolrInputDocument inputDoc = new SolrInputDocument();
> >     for(final Entry<String, NutchField> e : doc) {
> >       for (final Object val : e.getValue().getValues()) {
> >         inputDoc.addField(solrMapping.mapKey(e.getKey()), val,
> > e.getValue().getWeight());
> >         String sCopy = solrMapping.mapCopyKey(e.getKey());
> >         if (sCopy != e.getKey()) {
> >               inputDoc.addField(sCopy, val, e.getValue().getWeight());
> >         }
> >       }
> >     }
> >     inputDoc.setDocumentBoost(doc.getWeight());
> >     inputDocs.add(inputDoc);
> >
> > //here is my modified code
> >
> >     SolrInputDocument abc;
> >     Iterator it=inputDocs.iterator();
> >     while(it.hasNext())
> >     {
> >       abc=(SolrInputDocument)it.next();
> >       String test=(abc.toString());
> >
> >         Connection conn = null;
> >         String url = "jdbc:mysql://localhost:3306/";
> >         String dbName = "data";
> >         String driver = "com.mysql.jdbc.Driver";
> >         String userName = "root";
> >         String password = "passwd";
> >         try {
> >             Class.forName(driver).newInstance();
> >             conn =
> > DriverManager.getConnection(url+dbName,userName,password);
> > System.out.println("Connected to the database");
> >
> >                       java.sql.Statement s = conn.createStatement();
> >                       int r = s.executeUpdate("INSERT INTO data(data)
> > VALUES('"+test+"')");
> >
> >             System.out.println("Done");
> >        conn.close();
> >             System.out.println("Disconnected from database");
> >
> >               }
> >               catch (Exception e) {
> >                       System.out.println(e);
> >                       System.exit(0);
> >                       }
> >
> >     }
> >
> >     if (inputDocs.size() > commitSize) {
> >       try {
> >         solr.add(inputDocs);
> >
> >       } catch (final SolrServerException e) {
> >         throw makeIOException(e);
> >       }
> >       inputDocs.clear();
> >     }
> >   }
> >
> >   public void close() throws IOException {
> >     try {
> >       if (!inputDocs.isEmpty()) {
> >         solr.add(inputDocs);
> >         inputDocs.clear();
> >       }
> >       // solr.commit();
> >     } catch (final SolrServerException e) {
> >       throw makeIOException(e);
> >     }
> >   }
> >
> >   public static IOException makeIOException(SolrServerException e) {
> >     final IOException ioe = new IOException();
> >     ioe.initCause(e);
> >     return ioe;
> >   }
> >
> > }
> >
> > -Thanks you very much
> >
> > On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> > > public void write gets called for each NutchDocument and collects them
> in
> > > inputDocs. You could, after line 60, call a customer method to read all
> > > fields
> > > and create a SQL insert statement out of it.
> > >
> > > On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote:
> > >> hi markus,
> > >>
> > >> i try to modify the SolrWriter.java class and place my mysql connecter
> > >> their but nothing
> > >>
> > >> happens  so can please explain a little more with example of code that
> > >> exactly which
> > >>
> > >> part of SolrWriter class is going to be replace by mysql connecter.
> > >>
> > >> -Thanks You Very Much
> > >>
> > >> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> > >> > Here's the class you need to look at:
> > >> >
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/ap
> > >> > ach e/nutch/indexer/solr/SolrWriter.java?view=markup
> > >> >
> > >> >> Modifying the Solr index writer to use a MySQL connector is surely
> > >> >> the easiest short cut.
> > >> >>
> > >> >> > hi O.Klein
> > >> >> >
> > >> >> > thanks for the answer but i am using nutch 1.2 so any solution
> for
> > >> >> > this version.
> > >> >> >
> > >> >> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
> > >> >> > > Nutch 2.0 supports storage of data in MySQL DB.
> > >> >> > >
> > >> >> > > But that version is not for production yet.
> > >> >> > >
> > >> >> > > Check
> > >> >> > > http://techvineyard.blogspot.com/2010/12/build-nutch-20.htmlon
> > >> >> > > how to get it running.
> > >> >> > > --
> > >> >> > > View this message in context:
> > >> >> > >
> http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nut
> > >> >> > > ch- tp2 24 3983p2244263.html Sent from the Nutch - User mailing
> > >> >> > > list archive at
> > >> >> > > Nabble.com.
> > >
> > > --
> > > Markus Jelsma - CTO - Openindex
> > > http://www.linkedin.com/in/markus17
> > > 050-8536620 / 06-50258350
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
Iker Huerga
http://www.linkatu.net

Re: Connecting MySQL to Apache Nutch

Posted by Arjun Kumar Reddy <ch...@iiitb.net>.
Hi all,

I have been trying to integrate Nutch crawler with Mysql database..
For this I'm following this tutorial mentioned in the thread.
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

When I tried to do this step right click on ivy/ivy.xml, "Add Ivy Library
..." after modifying the mentioned files, I am facing problems.

*'Ivy resolve job of ivy/ivy.xml in 'nutch'' has encountered a problem.*
*Impossible to reslove dependencies of org.apache.nutch#${ant.project.name};
*

Impossible to resolve dependencies of org.apache.nutch#${ant.project.name
};working@arjun-ninjas
  unresolved dependency: org.apache.gora#gora-core;0.1: not found
  unresolved dependency: org.apache.gora#gora-sql;0.1: not found
  unresolved dependency: org.restlet.jse#org.restlet;2.0.0: not found
  unresolved dependency: org.restlet.jse#org.restlet.ext.jackson;2.0.0: not
found
  unresolved dependency: org.apache.gora#gora-core;0.1: not found
  unresolved dependency: org.apache.gora#gora-sql;0.1: not found
  unresolved dependency: org.restlet.jse#org.restlet;2.0.0: not found
  unresolved dependency: org.restlet.jse#org.restlet.ext.jackson;2.0.0: not
found
  unresolved dependency: org.apache.gora#gora-core;0.1: not found
  unresolved dependency: org.apache.gora#gora-sql;0.1: not found
  unresolved dependency: org.restlet.jse#org.restlet;2.0.0: not found
  unresolved dependency: org.restlet.jse#org.restlet.ext.jackson;2.0.0: not
found


I am attaching the screen shot
[image: Screenshot.png]

Could any

Thanks and regards,*

*Ch. Arjun Kumar Reddy,
International Institute of Information Technology – Bangalore (IIITB),*
 *



On Mon, Jan 17, 2011 at 11:56 PM, McGibbney, Lewis John <
Lewis.McGibbney@gcu.ac.uk> wrote:

> Hi list,
>
> Regarding this recently discussed topic, I have been trying to integrate
> MySQL 5.2.31 into Nutch 1.2. The article which comments on integration with
> 2.0, is obviously straightforward but provides little value for use with
> Nutch 1.2. I have been using Solr 1.4.0 as the indexing mechanism (therefore
> have been working with SolrWriter.java class as apose to LuceneWriter.java)
> and have tried to implement the code suggestions posted on this subject
> (including logging to try and debug myself) but have unfortunately been
> unsuccesful to date. When I send the final index command to SolrIndex I am
> finding that Solr is indexing the crawl data instead of MySQL.
>
> I am wondering if anyone can provide use case and some other hints as to
> where I am going wrong and which files I need to configure before I am able
> to index to MySQL.
>
> Please also tell me if this is an issue for Solr list if this is an opinion
> and I will post there in an attempt to get results.
>
> Thank you for any help.
>
> Lewis
>
> -----Original Message-----
> From: PEEYUSH CHANDEL [mailto:cpeeyush@gmail.com]
> Sent: 13 January 2011 17:46
> To: markus.jelsma@openindex.io; user
> Subject: Re: Connecting MySQL to Apache Nutch
>
> hi Markus
>
> i tried Log also but still the same problem.
>
> sry if  i am wrong but i think that this problem can be solved by changing
> in
>
> LuceneWriter.Java class because by default the indexer in Nutch 1.2 is
> luncence.
>
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>

RE: Connecting MySQL to Apache Nutch

Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.
Hi list,

Regarding this recently discussed topic, I have been trying to integrate MySQL 5.2.31 into Nutch 1.2. The article which comments on integration with 2.0, is obviously straightforward but provides little value for use with Nutch 1.2. I have been using Solr 1.4.0 as the indexing mechanism (therefore have been working with SolrWriter.java class as apose to LuceneWriter.java) and have tried to implement the code suggestions posted on this subject (including logging to try and debug myself) but have unfortunately been unsuccesful to date. When I send the final index command to SolrIndex I am finding that Solr is indexing the crawl data instead of MySQL.

I am wondering if anyone can provide use case and some other hints as to where I am going wrong and which files I need to configure before I am able to index to MySQL.

Please also tell me if this is an issue for Solr list if this is an opinion and I will post there in an attempt to get results.

Thank you for any help.

Lewis

-----Original Message-----
From: PEEYUSH CHANDEL [mailto:cpeeyush@gmail.com]
Sent: 13 January 2011 17:46
To: markus.jelsma@openindex.io; user
Subject: Re: Connecting MySQL to Apache Nutch

hi Markus

i tried Log also but still the same problem.

sry if  i am wrong but i think that this problem can be solved by changing in

LuceneWriter.Java class because by default the indexer in Nutch 1.2 is luncence.


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Re: Connecting MySQL to Apache Nutch

Posted by PEEYUSH CHANDEL <cp...@gmail.com>.
hi Markus

i tried Log also but still the same problem.

sry if  i am wrong but i think that this problem can be solved by changing in

LuceneWriter.Java class because by default the indexer in Nutch 1.2 is luncence.

On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> Try using the logger, this way you can check hadoop.log for your output.
>
> import:
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
>
> declare:
> public static Log LOG = LogFactory.getLog(SolrWriter.class);
>
> use:
> LOG.info("bla bla");
>
>
>
>
>
> On Thursday 13 January 2011 14:28:21 PEEYUSH CHANDEL wrote:
>> hi markus
>>
>> here is my modified SolarWriter class,please check it and correct me
>> if i am doing something wrong.
>>
>> i tried this code but nothing happens.
>>
>> package org.apache.nutch.indexer.solr;
>>
>> import java.io.IOException;
>> import java.util.ArrayList;
>> import java.util.List;
>> import java.util.Map.Entry;
>> import java.util.Iterator;
>> import java.sql.*;
>>
>> import org.apache.hadoop.mapred.JobConf;
>> import org.apache.nutch.indexer.NutchDocument;
>> import org.apache.nutch.indexer.NutchField;
>> import org.apache.nutch.indexer.NutchIndexWriter;
>> import org.apache.solr.client.solrj.SolrServer;
>> import org.apache.solr.client.solrj.SolrServerException;
>> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
>> import org.apache.solr.common.SolrInputDocument;
>>
>> public class SolrWriter implements NutchIndexWriter {
>>
>>   private SolrServer solr;
>>   private SolrMappingReader solrMapping;
>>
>>   private final List<SolrInputDocument> inputDocs =
>>     new ArrayList<SolrInputDocument>();
>>
>>   private int commitSize;
>>
>>   public void open(JobConf job, String name) throws IOException {
>>     solr = new CommonsHttpSolrServer(job.get(SolrConstants.SERVER_URL));
>>     commitSize = job.getInt(SolrConstants.COMMIT_SIZE, 1000);
>>     solrMapping = SolrMappingReader.getInstance(job);
>>   }
>>
>>   public void write(NutchDocument doc) throws IOException {
>>     final SolrInputDocument inputDoc = new SolrInputDocument();
>>     for(final Entry<String, NutchField> e : doc) {
>>       for (final Object val : e.getValue().getValues()) {
>>         inputDoc.addField(solrMapping.mapKey(e.getKey()), val,
>> e.getValue().getWeight());
>>         String sCopy = solrMapping.mapCopyKey(e.getKey());
>>         if (sCopy != e.getKey()) {
>>         	inputDoc.addField(sCopy, val, e.getValue().getWeight());
>>         }
>>       }
>>     }
>>     inputDoc.setDocumentBoost(doc.getWeight());
>>     inputDocs.add(inputDoc);
>>
>> //here is my modified code
>>
>>     SolrInputDocument abc;
>>     Iterator it=inputDocs.iterator();
>>     while(it.hasNext())
>>     {
>>     	abc=(SolrInputDocument)it.next();
>>     	String test=(abc.toString());
>>
>>         Connection conn = null;
>>         String url = "jdbc:mysql://localhost:3306/";
>>         String dbName = "data";
>>         String driver = "com.mysql.jdbc.Driver";
>>         String userName = "root";
>>         String password = "passwd";
>>         try {
>>             Class.forName(driver).newInstance();
>>             conn =
>> DriverManager.getConnection(url+dbName,userName,password);
>> System.out.println("Connected to the database");
>>
>>                       java.sql.Statement s = conn.createStatement();
>>                       int r = s.executeUpdate("INSERT INTO data(data)
>> VALUES('"+test+"')");
>>
>>             System.out.println("Done");
>>        conn.close();
>>             System.out.println("Disconnected from database");
>>
>>               }
>>               catch (Exception e) {
>>                       System.out.println(e);
>>                       System.exit(0);
>>                       }
>>
>>     }
>>
>>     if (inputDocs.size() > commitSize) {
>>       try {
>>         solr.add(inputDocs);
>>
>>       } catch (final SolrServerException e) {
>>         throw makeIOException(e);
>>       }
>>       inputDocs.clear();
>>     }
>>   }
>>
>>   public void close() throws IOException {
>>     try {
>>       if (!inputDocs.isEmpty()) {
>>         solr.add(inputDocs);
>>         inputDocs.clear();
>>       }
>>       // solr.commit();
>>     } catch (final SolrServerException e) {
>>       throw makeIOException(e);
>>     }
>>   }
>>
>>   public static IOException makeIOException(SolrServerException e) {
>>     final IOException ioe = new IOException();
>>     ioe.initCause(e);
>>     return ioe;
>>   }
>>
>> }
>>
>> -Thanks you very much
>>
>> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
>> > public void write gets called for each NutchDocument and collects them
>> > in
>> > inputDocs. You could, after line 60, call a customer method to read all
>> > fields
>> > and create a SQL insert statement out of it.
>> >
>> > On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote:
>> >> hi markus,
>> >>
>> >> i try to modify the SolrWriter.java class and place my mysql connecter
>> >> their but nothing
>> >>
>> >> happens  so can please explain a little more with example of code that
>> >> exactly which
>> >>
>> >> part of SolrWriter class is going to be replace by mysql connecter.
>> >>
>> >> -Thanks You Very Much
>> >>
>> >> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
>> >> > Here's the class you need to look at:
>> >> > http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/ap
>> >> > ach e/nutch/indexer/solr/SolrWriter.java?view=markup
>> >> >
>> >> >> Modifying the Solr index writer to use a MySQL connector is surely
>> >> >> the easiest short cut.
>> >> >>
>> >> >> > hi O.Klein
>> >> >> >
>> >> >> > thanks for the answer but i am using nutch 1.2 so any solution for
>> >> >> > this version.
>> >> >> >
>> >> >> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
>> >> >> > > Nutch 2.0 supports storage of data in MySQL DB.
>> >> >> > >
>> >> >> > > But that version is not for production yet.
>> >> >> > >
>> >> >> > > Check
>> >> >> > > http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on
>> >> >> > > how to get it running.
>> >> >> > > --
>> >> >> > > View this message in context:
>> >> >> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nut
>> >> >> > > ch- tp2 24 3983p2244263.html Sent from the Nutch - User mailing
>> >> >> > > list archive at
>> >> >> > > Nabble.com.
>> >
>> > --
>> > Markus Jelsma - CTO - Openindex
>> > http://www.linkedin.com/in/markus17
>> > 050-8536620 / 06-50258350
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Connecting MySQL to Apache Nutch

Posted by Markus Jelsma <ma...@openindex.io>.
Try using the logger, this way you can check hadoop.log for your output.

import:
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

declare:
public static Log LOG = LogFactory.getLog(SolrWriter.class);

use:
LOG.info("bla bla");





On Thursday 13 January 2011 14:28:21 PEEYUSH CHANDEL wrote:
> hi markus
> 
> here is my modified SolarWriter class,please check it and correct me
> if i am doing something wrong.
> 
> i tried this code but nothing happens.
> 
> package org.apache.nutch.indexer.solr;
> 
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Map.Entry;
> import java.util.Iterator;
> import java.sql.*;
> 
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.nutch.indexer.NutchDocument;
> import org.apache.nutch.indexer.NutchField;
> import org.apache.nutch.indexer.NutchIndexWriter;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
> 
> public class SolrWriter implements NutchIndexWriter {
> 
>   private SolrServer solr;
>   private SolrMappingReader solrMapping;
> 
>   private final List<SolrInputDocument> inputDocs =
>     new ArrayList<SolrInputDocument>();
> 
>   private int commitSize;
> 
>   public void open(JobConf job, String name) throws IOException {
>     solr = new CommonsHttpSolrServer(job.get(SolrConstants.SERVER_URL));
>     commitSize = job.getInt(SolrConstants.COMMIT_SIZE, 1000);
>     solrMapping = SolrMappingReader.getInstance(job);
>   }
> 
>   public void write(NutchDocument doc) throws IOException {
>     final SolrInputDocument inputDoc = new SolrInputDocument();
>     for(final Entry<String, NutchField> e : doc) {
>       for (final Object val : e.getValue().getValues()) {
>         inputDoc.addField(solrMapping.mapKey(e.getKey()), val,
> e.getValue().getWeight());
>         String sCopy = solrMapping.mapCopyKey(e.getKey());
>         if (sCopy != e.getKey()) {
>         	inputDoc.addField(sCopy, val, e.getValue().getWeight());
>         }
>       }
>     }
>     inputDoc.setDocumentBoost(doc.getWeight());
>     inputDocs.add(inputDoc);
> 
> //here is my modified code
> 
>     SolrInputDocument abc;
>     Iterator it=inputDocs.iterator();
>     while(it.hasNext())
>     {
>     	abc=(SolrInputDocument)it.next();
>     	String test=(abc.toString());
> 
>         Connection conn = null;
>         String url = "jdbc:mysql://localhost:3306/";
>         String dbName = "data";
>         String driver = "com.mysql.jdbc.Driver";
>         String userName = "root";
>         String password = "passwd";
>         try {
>             Class.forName(driver).newInstance();
>             conn =
> DriverManager.getConnection(url+dbName,userName,password);
> System.out.println("Connected to the database");
> 
>                       java.sql.Statement s = conn.createStatement();
>                       int r = s.executeUpdate("INSERT INTO data(data)
> VALUES('"+test+"')");
> 
>             System.out.println("Done");
>        conn.close();
>             System.out.println("Disconnected from database");
> 
>               }
>               catch (Exception e) {
>                       System.out.println(e);
>                       System.exit(0);
>                       }
> 
>     }
> 
>     if (inputDocs.size() > commitSize) {
>       try {
>         solr.add(inputDocs);
> 
>       } catch (final SolrServerException e) {
>         throw makeIOException(e);
>       }
>       inputDocs.clear();
>     }
>   }
> 
>   public void close() throws IOException {
>     try {
>       if (!inputDocs.isEmpty()) {
>         solr.add(inputDocs);
>         inputDocs.clear();
>       }
>       // solr.commit();
>     } catch (final SolrServerException e) {
>       throw makeIOException(e);
>     }
>   }
> 
>   public static IOException makeIOException(SolrServerException e) {
>     final IOException ioe = new IOException();
>     ioe.initCause(e);
>     return ioe;
>   }
> 
> }
> 
> -Thanks you very much
> 
> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> > public void write gets called for each NutchDocument and collects them in
> > inputDocs. You could, after line 60, call a customer method to read all
> > fields
> > and create a SQL insert statement out of it.
> > 
> > On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote:
> >> hi markus,
> >> 
> >> i try to modify the SolrWriter.java class and place my mysql connecter
> >> their but nothing
> >> 
> >> happens  so can please explain a little more with example of code that
> >> exactly which
> >> 
> >> part of SolrWriter class is going to be replace by mysql connecter.
> >> 
> >> -Thanks You Very Much
> >> 
> >> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> >> > Here's the class you need to look at:
> >> > http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/ap
> >> > ach e/nutch/indexer/solr/SolrWriter.java?view=markup
> >> > 
> >> >> Modifying the Solr index writer to use a MySQL connector is surely
> >> >> the easiest short cut.
> >> >> 
> >> >> > hi O.Klein
> >> >> > 
> >> >> > thanks for the answer but i am using nutch 1.2 so any solution for
> >> >> > this version.
> >> >> > 
> >> >> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
> >> >> > > Nutch 2.0 supports storage of data in MySQL DB.
> >> >> > > 
> >> >> > > But that version is not for production yet.
> >> >> > > 
> >> >> > > Check
> >> >> > > http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on
> >> >> > > how to get it running.
> >> >> > > --
> >> >> > > View this message in context:
> >> >> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nut
> >> >> > > ch- tp2 24 3983p2244263.html Sent from the Nutch - User mailing
> >> >> > > list archive at
> >> >> > > Nabble.com.
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Connecting MySQL to Apache Nutch

Posted by PEEYUSH CHANDEL <cp...@gmail.com>.
hi markus

here is my modified SolarWriter class,please check it and correct me
if i am doing something wrong.

i tried this code but nothing happens.

package org.apache.nutch.indexer.solr;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map.Entry;
import java.util.Iterator;
import java.sql.*;

import org.apache.hadoop.mapred.JobConf;
import org.apache.nutch.indexer.NutchDocument;
import org.apache.nutch.indexer.NutchField;
import org.apache.nutch.indexer.NutchIndexWriter;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.common.SolrInputDocument;

public class SolrWriter implements NutchIndexWriter {

  private SolrServer solr;
  private SolrMappingReader solrMapping;

  private final List<SolrInputDocument> inputDocs =
    new ArrayList<SolrInputDocument>();

  private int commitSize;

  public void open(JobConf job, String name) throws IOException {
    solr = new CommonsHttpSolrServer(job.get(SolrConstants.SERVER_URL));
    commitSize = job.getInt(SolrConstants.COMMIT_SIZE, 1000);
    solrMapping = SolrMappingReader.getInstance(job);
  }

  public void write(NutchDocument doc) throws IOException {
    final SolrInputDocument inputDoc = new SolrInputDocument();
    for(final Entry<String, NutchField> e : doc) {
      for (final Object val : e.getValue().getValues()) {
        inputDoc.addField(solrMapping.mapKey(e.getKey()), val,
e.getValue().getWeight());
        String sCopy = solrMapping.mapCopyKey(e.getKey());
        if (sCopy != e.getKey()) {
        	inputDoc.addField(sCopy, val, e.getValue().getWeight());	
        }
      }
    }
    inputDoc.setDocumentBoost(doc.getWeight());
    inputDocs.add(inputDoc);

//here is my modified code

    SolrInputDocument abc;
    Iterator it=inputDocs.iterator();
    while(it.hasNext())
    {
    	abc=(SolrInputDocument)it.next();
    	String test=(abc.toString());

        Connection conn = null;
        String url = "jdbc:mysql://localhost:3306/";
        String dbName = "data";
        String driver = "com.mysql.jdbc.Driver";
        String userName = "root";
        String password = "passwd";
        try {
            Class.forName(driver).newInstance();
            conn = DriverManager.getConnection(url+dbName,userName,password);
            System.out.println("Connected to the database");

                      java.sql.Statement s = conn.createStatement();
                      int r = s.executeUpdate("INSERT INTO data(data)
VALUES('"+test+"')");

            System.out.println("Done");
       conn.close();
            System.out.println("Disconnected from database");

              }
              catch (Exception e) {
                      System.out.println(e);
                      System.exit(0);
                      }

    }

    if (inputDocs.size() > commitSize) {
      try {
        solr.add(inputDocs);

      } catch (final SolrServerException e) {
        throw makeIOException(e);
      }
      inputDocs.clear();
    }
  }

  public void close() throws IOException {
    try {
      if (!inputDocs.isEmpty()) {
        solr.add(inputDocs);
        inputDocs.clear();
      }
      // solr.commit();
    } catch (final SolrServerException e) {
      throw makeIOException(e);
    }
  }

  public static IOException makeIOException(SolrServerException e) {
    final IOException ioe = new IOException();
    ioe.initCause(e);
    return ioe;
  }

}

-Thanks you very much

On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> public void write gets called for each NutchDocument and collects them in
> inputDocs. You could, after line 60, call a customer method to read all
> fields
> and create a SQL insert statement out of it.
>
> On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote:
>> hi markus,
>>
>> i try to modify the SolrWriter.java class and place my mysql connecter
>> their but nothing
>>
>> happens  so can please explain a little more with example of code that
>> exactly which
>>
>> part of SolrWriter class is going to be replace by mysql connecter.
>>
>> -Thanks You Very Much
>>
>> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
>> > Here's the class you need to look at:
>> > http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/apach
>> > e/nutch/indexer/solr/SolrWriter.java?view=markup
>> >
>> >> Modifying the Solr index writer to use a MySQL connector is surely the
>> >> easiest short cut.
>> >>
>> >> > hi O.Klein
>> >> >
>> >> > thanks for the answer but i am using nutch 1.2 so any solution for
>> >> > this version.
>> >> >
>> >> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
>> >> > > Nutch 2.0 supports storage of data in MySQL DB.
>> >> > >
>> >> > > But that version is not for production yet.
>> >> > >
>> >> > > Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html
>> >> > > on how to get it running.
>> >> > > --
>> >> > > View this message in context:
>> >> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-
>> >> > > tp2 24 3983p2244263.html Sent from the Nutch - User mailing list
>> >> > > archive at
>> >> > > Nabble.com.
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Connecting MySQL to Apache Nutch

Posted by Markus Jelsma <ma...@openindex.io>.
public void write gets called for each NutchDocument and collects them in 
inputDocs. You could, after line 60, call a customer method to read all fields 
and create a SQL insert statement out of it.

On Thursday 13 January 2011 13:55:14 PEEYUSH CHANDEL wrote:
> hi markus,
> 
> i try to modify the SolrWriter.java class and place my mysql connecter
> their but nothing
> 
> happens  so can please explain a little more with example of code that
> exactly which
> 
> part of SolrWriter class is going to be replace by mysql connecter.
> 
> -Thanks You Very Much
> 
> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> > Here's the class you need to look at:
> > http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/apach
> > e/nutch/indexer/solr/SolrWriter.java?view=markup
> > 
> >> Modifying the Solr index writer to use a MySQL connector is surely the
> >> easiest short cut.
> >> 
> >> > hi O.Klein
> >> > 
> >> > thanks for the answer but i am using nutch 1.2 so any solution for
> >> > this version.
> >> > 
> >> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
> >> > > Nutch 2.0 supports storage of data in MySQL DB.
> >> > > 
> >> > > But that version is not for production yet.
> >> > > 
> >> > > Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html
> >> > > on how to get it running.
> >> > > --
> >> > > View this message in context:
> >> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-
> >> > > tp2 24 3983p2244263.html Sent from the Nutch - User mailing list
> >> > > archive at
> >> > > Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Connecting MySQL to Apache Nutch

Posted by Iker Huerga <ik...@gmail.com>.
Hi,

I am working on the same issue, if you get any advance please let me know, i
will do the same

Thanks


2011/1/13 PEEYUSH CHANDEL <cp...@gmail.com>

> thanks markus for reply i am trying and let you know that what happen :)
>
> On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> > Here's the class you need to look at:
> >
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/apache/nutch/indexer/solr/SolrWriter.java?view=markup
> >
> >
> >> Modifying the Solr index writer to use a MySQL connector is surely the
> >> easiest short cut.
> >>
> >> > hi O.Klein
> >> >
> >> > thanks for the answer but i am using nutch 1.2 so any solution for
> this
> >> > version.
> >> >
> >> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
> >> > > Nutch 2.0 supports storage of data in MySQL DB.
> >> > >
> >> > > But that version is not for production yet.
> >> > >
> >> > > Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.htmlon
> >> > > how to get it running.
> >> > > --
> >> > > View this message in context:
> >> > >
> http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-tp2
> >> > > 24 3983p2244263.html Sent from the Nutch - User mailing list archive
> >> > > at
> >> > > Nabble.com.
> >
>



-- 
Iker Huerga
http://www.linkatu.net

Re: Connecting MySQL to Apache Nutch

Posted by PEEYUSH CHANDEL <cp...@gmail.com>.
thanks markus for reply i am trying and let you know that what happen :)

On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> Here's the class you need to look at:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/apache/nutch/indexer/solr/SolrWriter.java?view=markup
>
>
>> Modifying the Solr index writer to use a MySQL connector is surely the
>> easiest short cut.
>>
>> > hi O.Klein
>> >
>> > thanks for the answer but i am using nutch 1.2 so any solution for this
>> > version.
>> >
>> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
>> > > Nutch 2.0 supports storage of data in MySQL DB.
>> > >
>> > > But that version is not for production yet.
>> > >
>> > > Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on
>> > > how to get it running.
>> > > --
>> > > View this message in context:
>> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-tp2
>> > > 24 3983p2244263.html Sent from the Nutch - User mailing list archive
>> > > at
>> > > Nabble.com.
>

Re: Connecting MySQL to Apache Nutch

Posted by PEEYUSH CHANDEL <cp...@gmail.com>.
hi markus,

i try to modify the SolrWriter.java class and place my mysql connecter
their but nothing

happens  so can please explain a little more with example of code that
exactly which

part of SolrWriter class is going to be replace by mysql connecter.

-Thanks You Very Much

On 1/13/11, Markus Jelsma <ma...@openindex.io> wrote:
> Here's the class you need to look at:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/apache/nutch/indexer/solr/SolrWriter.java?view=markup
>
>
>> Modifying the Solr index writer to use a MySQL connector is surely the
>> easiest short cut.
>>
>> > hi O.Klein
>> >
>> > thanks for the answer but i am using nutch 1.2 so any solution for this
>> > version.
>> >
>> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
>> > > Nutch 2.0 supports storage of data in MySQL DB.
>> > >
>> > > But that version is not for production yet.
>> > >
>> > > Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on
>> > > how to get it running.
>> > > --
>> > > View this message in context:
>> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-tp2
>> > > 24 3983p2244263.html Sent from the Nutch - User mailing list archive
>> > > at
>> > > Nabble.com.
>

Re: Connecting MySQL to Apache Nutch

Posted by Markus Jelsma <ma...@openindex.io>.
Here's the class you need to look at:
http://svn.apache.org/viewvc/nutch/branches/branch-1.2/src/java/org/apache/nutch/indexer/solr/SolrWriter.java?view=markup


> Modifying the Solr index writer to use a MySQL connector is surely the
> easiest short cut.
> 
> > hi O.Klein
> > 
> > thanks for the answer but i am using nutch 1.2 so any solution for this
> > version.
> > 
> > On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
> > > Nutch 2.0 supports storage of data in MySQL DB.
> > > 
> > > But that version is not for production yet.
> > > 
> > > Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on
> > > how to get it running.
> > > --
> > > View this message in context:
> > > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-tp2
> > > 24 3983p2244263.html Sent from the Nutch - User mailing list archive at
> > > Nabble.com.

Re: Connecting MySQL to Apache Nutch

Posted by PEEYUSH CHANDEL <cp...@gmail.com>.
I am using Apache Nutch first time. How can I store data into a MySQL
database after crawling?

i am using Nutch 1.2

So that i can  be able to easily use the data in other web applications.

I found a question related to this here

(http://stackoverflow.com/questions/3227259/nutch-mysql-integration),

but I don't clearly understand which part of the code is gona replace
by MySQL connector. Please help with a short code example.

-thanks in advance

Re: Connecting MySQL to Apache Nutch

Posted by Markus Jelsma <ma...@openindex.io>.
Modifying the Solr index writer to use a MySQL connector is surely the easiest 
short cut.

> hi O.Klein
> 
> thanks for the answer but i am using nutch 1.2 so any solution for this
> version.
> 
> On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
> > Nutch 2.0 supports storage of data in MySQL DB.
> > 
> > But that version is not for production yet.
> > 
> > Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on how
> > to get it running.
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-tp224
> > 3983p2244263.html Sent from the Nutch - User mailing list archive at
> > Nabble.com.

Re: Connecting MySQL to Apache Nutch

Posted by PEEYUSH CHANDEL <cp...@gmail.com>.
hi O.Klein

thanks for the answer but i am using nutch 1.2 so any solution for this version.

On 1/13/11, O. Klein <kl...@octoweb.nl> wrote:
>
> Nutch 2.0 supports storage of data in MySQL DB.
>
> But that version is not for production yet.
>
> Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on how to
> get it running.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-tp2243983p2244263.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Re: Connecting MySQL to Apache Nutch

Posted by "O. Klein" <kl...@octoweb.nl>.
Nutch 2.0 supports storage of data in MySQL DB.

But that version is not for production yet.

Check http://techvineyard.blogspot.com/2010/12/build-nutch-20.html on how to
get it running.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Connecting-MySQL-to-Apache-Nutch-tp2243983p2244263.html
Sent from the Nutch - User mailing list archive at Nabble.com.