You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Lee Smith <le...@weblee.co.uk> on 2010/02/26 13:54:47 UTC

Content Extraction

Hey All

Hope someone can advise.

I followed the example in the wiki on how to extract a html page i.e

curl 'http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true' -F "myfile=@tutorial.html"

And it displayed a html page but with a 404 and did not index the document?

Any suggestions on how I can fix this?

Thanks if you can advise.

Lee

Re: Content Extraction

Posted by Lee Smith <le...@weblee.co.uk>.

Hi Erik

I did a post with more details yesterday with no response.

I have a screen shot of what it does: http://screencast.com/t/MGRiZTU5M

After running it I have done a query with 0 results and have checked to see how many docs are indexed with 0 being the value.

Hope you can shed some more light for me.

Lee

On 26 Feb 2010, at 14:57, Erick Erickson wrote:

> You really have to provide more details of
> a> what you did.
> b> what the results were.
> 
> Have you looked at you r index with the admin page and/or Luke?
> Have you tried querying in the admin page?
> Have you examined the logs to see what they report?
> 
> Best
> Erick
> 
> On Fri, Feb 26, 2010 at 7:54 AM, Lee Smith <le...@weblee.co.uk> wrote:
> 
>> Hey All
>> 
>> Hope someone can advise.
>> 
>> I followed the example in the wiki on how to extract a html page i.e
>> 
>> curl '
>> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true'
>> -F "myfile=@tutorial.html"
>> 
>> And it displayed a html page but with a 404 and did not index the document?
>> 
>> Any suggestions on how I can fix this?
>> 
>> Thanks if you can advise.
>> 
>> Lee
>> 
>>

Re: Content Extraction

Posted by Erick Erickson <er...@gmail.com>.

You really have to provide more details of
a> what you did.
b> what the results were.

Have you looked at you r index with the admin page and/or Luke?
Have you tried querying in the admin page?
Have you examined the logs to see what they report?

Best
Erick

On Fri, Feb 26, 2010 at 7:54 AM, Lee Smith <le...@weblee.co.uk> wrote:

> Hey All
>
> Hope someone can advise.
>
> I followed the example in the wiki on how to extract a html page i.e
>
> curl '
> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true'
> -F "myfile=@tutorial.html"
>
> And it displayed a html page but with a 404 and did not index the document?
>
> Any suggestions on how I can fix this?
>
> Thanks if you can advise.
>
> Lee
>
>