You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by mina <ta...@gmail.com> on 2012/01/31 10:51:06 UTC

why nutch dosen't crawl Arabic sites well?

i can crawl an arabic site like: http://www.sahafa.com/
but i can't crawl another site like:http://www.aljazeera.net/Portal/
 help me please. 

--
View this message in context: http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-well-tp3702769p3702769.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: why nutch dosen't crawl Arabic sites well?

Posted by remi tassing <ta...@gmail.com>.
Try the following command. It'll export all the urls that were crawled.
[1] http://wiki.apache.org/nutch/bin/nutch_readdb

Remi


On Wednesday, February 1, 2012, mina <ta...@gmail.com> wrote:
> i have no error in my log, has nutch an error for crawl Arabic sites?
help me.
>
> On 1/31/12, remi tassing [via Lucene]
> <ml...@n3.nabble.com> wrote:
>>
>>
>> Check your log for any error
>>
>> On Tuesday, January 31, 2012, Markus Jelsma <ma...@openindex.io>
>> wrote:
>>> By the way, please don't send every message twice or more.
>>>
>>> On Tuesday 31 January 2012 10:51:06 mina wrote:
>>>> i can crawl an arabic site like: http://www.sahafa.com/
>>>> but i can't crawl another site like:http://www.aljazeera.net/Portal/
>>>>  help me please.
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>
http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-we
>>>> ll-tp3702769p3702769.html Sent from the Nutch - User mailing list
archive
>>>> at Nabble.com.
>>>
>>> --
>>> Markus Jelsma - CTO - Openindex
>>>
>>
>>
>> _______________________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-well-tp3702769p3704067.html
>>
>> To unsubscribe from why nutch dosen't crawl Arabic sites well?, visit
>>
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3702769&code=dGFoZXJlZ2Fuaml5YXJAZ21haWwuY29tfDM3MDI3Njl8NTgyODE5NjA3
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-well-tp3702769p3705819.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

Re: why nutch dosen't crawl Arabic sites well?

Posted by mina <ta...@gmail.com>.
i have no error in my log, has nutch an error for crawl Arabic sites? help me.

On 1/31/12, remi tassing [via Lucene]
<ml...@n3.nabble.com> wrote:
>
>
> Check your log for any error
>
> On Tuesday, January 31, 2012, Markus Jelsma <ma...@openindex.io>
> wrote:
>> By the way, please don't send every message twice or more.
>>
>> On Tuesday 31 January 2012 10:51:06 mina wrote:
>>> i can crawl an arabic site like: http://www.sahafa.com/
>>> but i can't crawl another site like:http://www.aljazeera.net/Portal/
>>>  help me please.
>>>
>>> --
>>> View this message in context:
>>>
> http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-we
>>> ll-tp3702769p3702769.html Sent from the Nutch - User mailing list archive
>>> at Nabble.com.
>>
>> --
>> Markus Jelsma - CTO - Openindex
>>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-well-tp3702769p3704067.html
>
> To unsubscribe from why nutch dosen't crawl Arabic sites well?, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3702769&code=dGFoZXJlZ2Fuaml5YXJAZ21haWwuY29tfDM3MDI3Njl8NTgyODE5NjA3


--
View this message in context: http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-well-tp3702769p3705819.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: why nutch dosen't crawl Arabic sites well?

Posted by remi tassing <ta...@gmail.com>.
Check your log for any error

On Tuesday, January 31, 2012, Markus Jelsma <ma...@openindex.io>
wrote:
> By the way, please don't send every message twice or more.
>
> On Tuesday 31 January 2012 10:51:06 mina wrote:
>> i can crawl an arabic site like: http://www.sahafa.com/
>> but i can't crawl another site like:http://www.aljazeera.net/Portal/
>>  help me please.
>>
>> --
>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-we
>> ll-tp3702769p3702769.html Sent from the Nutch - User mailing list archive
>> at Nabble.com.
>
> --
> Markus Jelsma - CTO - Openindex
>

Re: why nutch dosen't crawl Arabic sites well?

Posted by Markus Jelsma <ma...@openindex.io>.
By the way, please don't send every message twice or more. 

On Tuesday 31 January 2012 10:51:06 mina wrote:
> i can crawl an arabic site like: http://www.sahafa.com/
> but i can't crawl another site like:http://www.aljazeera.net/Portal/
>  help me please.
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/why-nutch-dosen-t-crawl-Arabic-sites-we
> ll-tp3702769p3702769.html Sent from the Nutch - User mailing list archive
> at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex