You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lee Smith <le...@weblee.co.uk> on 2010/02/26 13:54:47 UTC
Content Extraction
Hey All
Hope someone can advise.
I followed the example in the wiki on how to extract a html page i.e
curl 'http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true' -F "myfile=@tutorial.html"
And it displayed a html page but with a 404 and did not index the document?
Any suggestions on how I can fix this?
Thanks if you can advise.
Lee
Re: Content Extraction
Posted by Lee Smith <le...@weblee.co.uk>.
Hi Erik
I did a post with more details yesterday with no response.
I have a screen shot of what it does: http://screencast.com/t/MGRiZTU5M
After running it I have done a query with 0 results and have checked to see how many docs are indexed with 0 being the value.
Hope you can shed some more light for me.
Lee
On 26 Feb 2010, at 14:57, Erick Erickson wrote:
> You really have to provide more details of
> a> what you did.
> b> what the results were.
>
> Have you looked at you r index with the admin page and/or Luke?
> Have you tried querying in the admin page?
> Have you examined the logs to see what they report?
>
> Best
> Erick
>
> On Fri, Feb 26, 2010 at 7:54 AM, Lee Smith <le...@weblee.co.uk> wrote:
>
>> Hey All
>>
>> Hope someone can advise.
>>
>> I followed the example in the wiki on how to extract a html page i.e
>>
>> curl '
>> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true'
>> -F "myfile=@tutorial.html"
>>
>> And it displayed a html page but with a 404 and did not index the document?
>>
>> Any suggestions on how I can fix this?
>>
>> Thanks if you can advise.
>>
>> Lee
>>
>>
Re: Content Extraction
Posted by Erick Erickson <er...@gmail.com>.
You really have to provide more details of
a> what you did.
b> what the results were.
Have you looked at you r index with the admin page and/or Luke?
Have you tried querying in the admin page?
Have you examined the logs to see what they report?
Best
Erick
On Fri, Feb 26, 2010 at 7:54 AM, Lee Smith <le...@weblee.co.uk> wrote:
> Hey All
>
> Hope someone can advise.
>
> I followed the example in the wiki on how to extract a html page i.e
>
> curl '
> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true'
> -F "myfile=@tutorial.html"
>
> And it displayed a html page but with a 404 and did not index the document?
>
> Any suggestions on how I can fix this?
>
> Thanks if you can advise.
>
> Lee
>
>