You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Michael Dockery <do...@yahoo.com> on 2011/09/12 02:12:41 UTC

select query does not find indexed pdf document

I am new to solr.  

I tried to upload a pdf file via curl to my solr webapp (on tomcat)

curl "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdf&commit=true"



<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">860</int></lst>
</response>


but

http://www/SearchApp/select/?q=vpn


does not find the document


<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="q">vpn</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>


help is appreciated.

=================================================
fyi
I point my test webapp to the index/solr home via mod meta-data/context.xml
<Context crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" 
   value="c:/solr_home" override="true" />

and I had to copy all these jars to my webapp lib dir: (to avoid the classnotfound)
Solr_download\contrib\extraction\lib
  ...in the future i plan to put them in the tomcat/lib dir.


Also, I have not modified conf\solrconfig.xml or schema.xml.

Re: select query does not find indexed pdf document

Posted by Erick Erickson <er...@gmail.com>.

You can use <copyField> to put data from separate fields into a common
search field.

This page will help you get started on what mods you'd need to make on
a <fieldType>
to analyze it as you wish:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

But at a start think about WhitespaceTokenizer followed by
LowerCaseFilterFactory
AsciiFoldingFilterFactory
NGramFilterFactory


Pay attention to the note at the top that directs you to the full
list, the page above contains
a partial list. For instance, NGramFilterFactory isn't that page, it's
on the page that's linked
to: http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html

Best
Erick

On Tue, Sep 13, 2011 at 10:46 PM, Michael Dockery
<do...@yahoo.com> wrote:
> Thank you for your informative reply.
>
> I would like to start simple by combining both filename and content
>   into the same default search field
>    ...which my default schema xml calls  "text"
> ...
> <defaultSearchField>text</defaultSearchField>
> ...
>
> also:
> -case and accent insensitive
> -no splits on numb3rs
> -no highlights
> -text processing same for index and search
>
> however I do like
> -I like ngrams prerrably (partial/prefix word/token search)
>
>
> what schema mod's would be needed?
>
> also what curl syntax to submit/index a pdf (with filename and content combined into the default search field)?
>
>
>
> ________________________________
> From: Bob Sandiford <bo...@sirsidynix.com>
> To: Michael Dockery <do...@yahoo.com>
> Cc: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Monday, September 12, 2011 1:38 PM
> Subject: RE: select query does not find indexed pdf document
>
> Hi, Michael.
>
> Well, the stock answer is, 'it depends'
>
> For example - would you want to be able to search filename without searching file contents, or would you always search both of them together?  If both, then copy both the file name and the parsed file content from the pdf into a single search field, and you can set that up as the default search field.
>
> Or - what kind of processing / normalizing do you want on this data?  Case insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. TheVeryIdea), do you want that split on the case changes?  (but then watch out for things like "iPad")  If a 'word' contains numbers, do want them left together, or separated?  Do you want stemming (where searching for 'stemming' would also find 'stem', 'stemmed', that sort of thing?)  Is this always English, or are the other languages involved.  Do you want the text processing to be the same for indexing vs searching?  Do you want to be able to find hits based on the first few characters of a term?  (ngrams)
>
> Do you want to be able to highlight text segments where the search terms were found?
>
> probably you want to read up on the various tokenizers and filters that are available.  Do some prototyping and see how it looks.
>
> Here's a starting point: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> Basically, there is no 'one size fits all' here.  Part of the power of Solr / Lucene is its configurability to achieve the results your business case calls for.  Part of the drawback of Solr / Lucene - especially for new folks - is its configurability to achieve the results you business case calls for. :)
>
> Anyone got anything else to suggest for Michael?
>
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com
> www.sirsidynix.com<http://www.sirsidynix.com/>
>
> From: Michael Dockery [mailto:dockeryjavaman@yahoo.com]
> Sent: Monday, September 12, 2011 1:18 PM
> To: Bob Sandiford
> Subject: Re: select query does not find indexed pdf document
>
> thank you.  that worked.
>
> Any tips for   very   very  basic setup of the schema xml?
>    ....or is the default basic enough?
>
> I basically only want to search search on
>         filename   and    file contents
>
>
> From: Bob Sandiford <bo...@sirsidynix.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>; Michael Dockery <do...@yahoo.com>
> Sent: Monday, September 12, 2011 10:04 AM
> Subject: RE: select query does not find indexed pdf document
>
> Um - looks like you specified your id value as "pdfy", which is reflected in the results from the "*:*" query, but your id query is searching for "vpn", hence no matches...
>
> What does this query yield?
>
> http://www/SearchApp/select/?q=id:pdfy
>
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com<ma...@sirsidynix.com>
> www.sirsidynix.com
>
>> -----Original Message-----
>> From: Michael Dockery [mailto:dockeryjavaman@yahoo.com<ma...@yahoo.com>]
>> Sent: Monday, September 12, 2011 9:56 AM
>> To: solr-user@lucene.apache.org<ma...@lucene.apache.org>
>> Subject: Re: select query does not find indexed pdf document
>>
>> http://www/SearchApp/select/?q=id:vpn
>>
>> yeilds this:
>>   <?xml version="1.0" encoding="UTF-8" ?>
>> - <response>
>> - <lstname="responseHeader">
>>   <intname="status">0</int>
>>   <intname="QTime">15</int>
>> - <lstname="params">
>>   <strname="q">id:vpn</str>
>>   </lst>
>>   </lst>
>>   <result name="response"numFound="0"start="0"/>
>>   </response>
>>
>>
>> *****************************************
>>
>>  http://www/SearchApp/select/?q=*:*
>>
>> yeilds this:
>>
>>   <?xml version="1.0" encoding="UTF-8" ?>
>> - <response>
>> - <lstname="responseHeader">
>>   <intname="status">0</int>
>>   <intname="QTime">16</int>
>> - <lstname="params">
>>   <strname="q">*.*</str>
>>   </lst>
>>   </lst>
>> - <resultname="response"numFound="1"start="0">
>> - <doc>
>>   <strname="author">doc</str>
>> - <arrname="content_type">
>>   <str>application/pdf</str>
>>   </arr>
>>   <strname="id">pdfy</str>
>>   <datename="last_modified">2011-05-20T02:08:48Z</date>
>> - <arrname="title">
>>   <str>dmvpndeploy.pdf</str>
>>   </arr>
>>   </doc>
>>   </result>
>>   </response>
>>
>>
>> From: Jan Høydahl <ja...@cominvent.com>>
>> To: solr-user@lucene.apache.org<ma...@lucene.apache.org>; Michael Dockery
>> <do...@yahoo.com>>
>> Sent: Monday, September 12, 2011 4:59 AM
>> Subject: Re: select query does not find indexed pdf document
>>
>> Hi,
>>
>> What do you get from a query http://www/SearchApp/select/?q=*:* or
>> http://www/SearchApp/select/?q=id:vpn ?
>> You may not have mapped the fields correctly to your schema?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> On 12. sep. 2011, at 02:12, Michael Dockery wrote:
>>
>> > I am new to solr.
>> >
>> > I tried to upload a pdf file via curl to my solr webapp (on tomcat)
>> >
>> > curl
>> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.co
>> ntentType=application/pdf&literal.id=pdfy&commit=true"
>> >
>> >
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <response>
>> > <lst name="responseHeader"><int name="status">0</int><int
>> name="QTime">860</int></lst>
>> > </response>
>> >
>> >
>> > but
>> >
>> > http://www/SearchApp/select/?q=vpn
>> >
>> >
>> > does not find the document
>> >
>> >
>> > <response>
>> > <lst name="responseHeader">
>> > <int name="status">0</int>
>> > <int name="QTime">0</int>
>> > <lst name="params">
>> > <str name="q">vpn</str>
>> > </lst>
>> > </lst>
>> > <result name="response" numFound="0" start="0"/>
>> > </response>
>> >
>> >
>> > help is appreciated.
>> >
>> > =================================================
>> > fyi
>> > I point my test webapp to the index/solr home via mod meta-
>> data/context.xml
>> > <Context crossContext="true" >
>> >    <Environment name="solr/home" type="java.lang.String"
>> >  value="c:/solr_home" override="true" />
>> >
>> > and I had to copy all these jars to my webapp lib dir: (to avoid the
>> classnotfound)
>> > Solr_download\contrib\extraction\lib
>> >  ...in the future i plan to put them in the tomcat/lib dir.
>> >
>> >
>> > Also, I have not modified conf\solrconfig.xml or schema.xml.

Re: select query does not find indexed pdf document

Posted by Michael Dockery <do...@yahoo.com>.

Thank you for your informative reply.

I would like to start simple by combining both filename and content 
  into the same default search field
   ...which my default schema xml calls  "text"
...
<defaultSearchField>text</defaultSearchField>
...

also:
-case and accent insensitive
-no splits on numb3rs
-no highlights 
-text processing same for index and search

however I do like
-I like ngrams prerrably (partial/prefix word/token search)

what schema mod's would be needed?

also what curl syntax to submit/index a pdf (with filename and content combined into the default search field)?

________________________________
From: Bob Sandiford <bo...@sirsidynix.com>
To: Michael Dockery <do...@yahoo.com>
Cc: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
Sent: Monday, September 12, 2011 1:38 PM
Subject: RE: select query does not find indexed pdf document

Hi, Michael.

Well, the stock answer is, 'it depends'

For example - would you want to be able to search filename without searching file contents, or would you always search both of them together?  If both, then copy both the file name and the parsed file content from the pdf into a single search field, and you can set that up as the default search field.

Or - what kind of processing / normalizing do you want on this data?  Case insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. TheVeryIdea), do you want that split on the case changes?  (but then watch out for things like "iPad")  If a 'word' contains numbers, do want them left together, or separated?  Do you want stemming (where searching for 'stemming' would also find 'stem', 'stemmed', that sort of thing?)  Is this always English, or are the other languages involved.  Do you want the text processing to be the same for indexing vs searching?  Do you want to be able to find hits based on the first few characters of a term?  (ngrams)

Do you want to be able to highlight text segments where the search terms were found?

probably you want to read up on the various tokenizers and filters that are available.  Do some prototyping and see how it looks.

Here's a starting point: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Basically, there is no 'one size fits all' here.  Part of the power of Solr / Lucene is its configurability to achieve the results your business case calls for.  Part of the drawback of Solr / Lucene - especially for new folks - is its configurability to achieve the results you business case calls for. :)

Anyone got anything else to suggest for Michael?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com
www.sirsidynix.com<http://www.sirsidynix.com/>

From: Michael Dockery [mailto:dockeryjavaman@yahoo.com]
Sent: Monday, September 12, 2011 1:18 PM
To: Bob Sandiford
Subject: Re: select query does not find indexed pdf document

thank you.  that worked.

Any tips for   very   very  basic setup of the schema xml?
   ....or is the default basic enough?

I basically only want to search search on
        filename   and    file contents

From: Bob Sandiford <bo...@sirsidynix.com>
To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>; Michael Dockery <do...@yahoo.com>
Sent: Monday, September 12, 2011 10:04 AM
Subject: RE: select query does not find indexed pdf document

Um - looks like you specified your id value as "pdfy", which is reflected in the results from the "*:*" query, but your id query is searching for "vpn", hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com<ma...@sirsidynix.com>
www.sirsidynix.com

> -----Original Message-----
> From: Michael Dockery [mailto:dockeryjavaman@yahoo.com<ma...@yahoo.com>]
> Sent: Monday, September 12, 2011 9:56 AM
> To: solr-user@lucene.apache.org<ma...@lucene.apache.org>
> Subject: Re: select query does not find indexed pdf document
>
> http://www/SearchApp/select/?q=id:vpn
>
> yeilds this:
>   <?xml version="1.0" encoding="UTF-8" ?>
> - <response>
> - <lstname="responseHeader">
>   <intname="status">0</int>
>   <intname="QTime">15</int>
> - <lstname="params">
>   <strname="q">id:vpn</str>
>   </lst>
>   </lst>
>   <result name="response"numFound="0"start="0"/>
>   </response>
>
>
> *****************************************
>
>  http://www/SearchApp/select/?q=*:*
>
> yeilds this:
>
>   <?xml version="1.0" encoding="UTF-8" ?>
> - <response>
> - <lstname="responseHeader">
>   <intname="status">0</int>
>   <intname="QTime">16</int>
> - <lstname="params">
>   <strname="q">*.*</str>
>   </lst>
>   </lst>
> - <resultname="response"numFound="1"start="0">
> - <doc>
>   <strname="author">doc</str>
> - <arrname="content_type">
>   <str>application/pdf</str>
>   </arr>
>   <strname="id">pdfy</str>
>   <datename="last_modified">2011-05-20T02:08:48Z</date>
> - <arrname="title">
>   <str>dmvpndeploy.pdf</str>
>   </arr>
>   </doc>
>   </result>
>   </response>
>
>
> From: Jan Høydahl <ja...@cominvent.com>>
> To: solr-user@lucene.apache.org<ma...@lucene.apache.org>; Michael Dockery
> <do...@yahoo.com>>
> Sent: Monday, September 12, 2011 4:59 AM
> Subject: Re: select query does not find indexed pdf document
>
> Hi,
>
> What do you get from a query http://www/SearchApp/select/?q=*:* or
> http://www/SearchApp/select/?q=id:vpn ?
> You may not have mapped the fields correctly to your schema?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 12. sep. 2011, at 02:12, Michael Dockery wrote:
>
> > I am new to solr.
> >
> > I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> >
> > curl
> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.co
> ntentType=application/pdf&literal.id=pdfy&commit=true"
> >
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <response>
> > <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">860</int></lst>
> > </response>
> >
> >
> > but
> >
> > http://www/SearchApp/select/?q=vpn
> >
> >
> > does not find the document
> >
> >
> > <response>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">0</int>
> > <lst name="params">
> > <str name="q">vpn</str>
> > </lst>
> > </lst>
> > <result name="response" numFound="0" start="0"/>
> > </response>
> >
> >
> > help is appreciated.
> >
> > =================================================
> > fyi
> > I point my test webapp to the index/solr home via mod meta-
> data/context.xml
> > <Context crossContext="true" >
> >    <Environment name="solr/home" type="java.lang.String"
> >  value="c:/solr_home" override="true" />
> >
> > and I had to copy all these jars to my webapp lib dir: (to avoid the
> classnotfound)
> > Solr_download\contrib\extraction\lib
> >  ...in the future i plan to put them in the tomcat/lib dir.
> >
> >
> > Also, I have not modified conf\solrconfig.xml or schema.xml.

RE: select query does not find indexed pdf document

Posted by Bob Sandiford <bo...@sirsidynix.com>.

Hi, Michael.

Well, the stock answer is, 'it depends'

For example - would you want to be able to search filename without searching file contents, or would you always search both of them together?  If both, then copy both the file name and the parsed file content from the pdf into a single search field, and you can set that up as the default search field.

Or - what kind of processing / normalizing do you want on this data?  Case insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. TheVeryIdea), do you want that split on the case changes?  (but then watch out for things like "iPad")  If a 'word' contains numbers, do want them left together, or separated?  Do you want stemming (where searching for 'stemming' would also find 'stem', 'stemmed', that sort of thing?)  Is this always English, or are the other languages involved.  Do you want the text processing to be the same for indexing vs searching?  Do you want to be able to find hits based on the first few characters of a term?  (ngrams)

Do you want to be able to highlight text segments where the search terms were found?

probably you want to read up on the various tokenizers and filters that are available.  Do some prototyping and see how it looks.

Here's a starting point: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Basically, there is no 'one size fits all' here.  Part of the power of Solr / Lucene is its configurability to achieve the results your business case calls for.  Part of the drawback of Solr / Lucene - especially for new folks - is its configurability to achieve the results you business case calls for. :)

Anyone got anything else to suggest for Michael?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com
www.sirsidynix.com<http://www.sirsidynix.com/>

From: Michael Dockery [mailto:dockeryjavaman@yahoo.com]
Sent: Monday, September 12, 2011 1:18 PM
To: Bob Sandiford
Subject: Re: select query does not find indexed pdf document

thank you.  that worked.

Any tips for   very   very  basic setup of the schema xml?
   ....or is the default basic enough?

I basically only want to search search on
        filename   and    file contents

From: Bob Sandiford <bo...@sirsidynix.com>
To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>; Michael Dockery <do...@yahoo.com>
Sent: Monday, September 12, 2011 10:04 AM
Subject: RE: select query does not find indexed pdf document

Um - looks like you specified your id value as "pdfy", which is reflected in the results from the "*:*" query, but your id query is searching for "vpn", hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com<ma...@sirsidynix.com>
www.sirsidynix.com

> -----Original Message-----
> From: Michael Dockery [mailto:dockeryjavaman@yahoo.com<ma...@yahoo.com>]
> Sent: Monday, September 12, 2011 9:56 AM
> To: solr-user@lucene.apache.org<ma...@lucene.apache.org>
> Subject: Re: select query does not find indexed pdf document
>
> http://www/SearchApp/select/?q=id:vpn
>
> yeilds this:
>   <?xml version="1.0" encoding="UTF-8" ?>
> - <response>
> - <lstname="responseHeader">
>   <intname="status">0</int>
>   <intname="QTime">15</int>
> - <lstname="params">
>   <strname="q">id:vpn</str>
>   </lst>
>   </lst>
>   <result name="response"numFound="0"start="0"/>
>   </response>
>
>
> *****************************************
>
>  http://www/SearchApp/select/?q=*:*
>
> yeilds this:
>
>   <?xml version="1.0" encoding="UTF-8" ?>
> - <response>
> - <lstname="responseHeader">
>   <intname="status">0</int>
>   <intname="QTime">16</int>
> - <lstname="params">
>   <strname="q">*.*</str>
>   </lst>
>   </lst>
> - <resultname="response"numFound="1"start="0">
> - <doc>
>   <strname="author">doc</str>
> - <arrname="content_type">
>   <str>application/pdf</str>
>   </arr>
>   <strname="id">pdfy</str>
>   <datename="last_modified">2011-05-20T02:08:48Z</date>
> - <arrname="title">
>   <str>dmvpndeploy.pdf</str>
>   </arr>
>   </doc>
>   </result>
>   </response>
>
>
> From: Jan Høydahl <ja...@cominvent.com>>
> To: solr-user@lucene.apache.org<ma...@lucene.apache.org>; Michael Dockery
> <do...@yahoo.com>>
> Sent: Monday, September 12, 2011 4:59 AM
> Subject: Re: select query does not find indexed pdf document
>
> Hi,
>
> What do you get from a query http://www/SearchApp/select/?q=*:* or
> http://www/SearchApp/select/?q=id:vpn ?
> You may not have mapped the fields correctly to your schema?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 12. sep. 2011, at 02:12, Michael Dockery wrote:
>
> > I am new to solr.
> >
> > I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> >
> > curl
> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.co
> ntentType=application/pdf&literal.id=pdfy&commit=true"
> >
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <response>
> > <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">860</int></lst>
> > </response>
> >
> >
> > but
> >
> > http://www/SearchApp/select/?q=vpn
> >
> >
> > does not find the document
> >
> >
> > <response>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">0</int>
> > <lst name="params">
> > <str name="q">vpn</str>
> > </lst>
> > </lst>
> > <result name="response" numFound="0" start="0"/>
> > </response>
> >
> >
> > help is appreciated.
> >
> > =================================================
> > fyi
> > I point my test webapp to the index/solr home via mod meta-
> data/context.xml
> > <Context crossContext="true" >
> >    <Environment name="solr/home" type="java.lang.String"
> >  value="c:/solr_home" override="true" />
> >
> > and I had to copy all these jars to my webapp lib dir: (to avoid the
> classnotfound)
> > Solr_download\contrib\extraction\lib
> >  ...in the future i plan to put them in the tomcat/lib dir.
> >
> >
> > Also, I have not modified conf\solrconfig.xml or schema.xml.

RE: select query does not find indexed pdf document

Posted by Bob Sandiford <bo...@sirsidynix.com>.

Um - looks like you specified your id value as "pdfy", which is reflected in the results from the "*:*" query, but your id query is searching for "vpn", hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.com
www.sirsidynix.com

> -----Original Message-----
> From: Michael Dockery [mailto:dockeryjavaman@yahoo.com]
> Sent: Monday, September 12, 2011 9:56 AM
> To: solr-user@lucene.apache.org
> Subject: Re: select query does not find indexed pdf document
> 
> http://www/SearchApp/select/?q=id:vpn
> 
> yeilds this:
>   <?xml version="1.0" encoding="UTF-8" ?>
> - <response>
> - <lstname="responseHeader">
>   <intname="status">0</int>
>   <intname="QTime">15</int>
> - <lstname="params">
>   <strname="q">id:vpn</str>
>   </lst>
>   </lst>
>   <result name="response"numFound="0"start="0"/>
>   </response>
> 
> 
> *****************************************
> 
>  http://www/SearchApp/select/?q=*:*
> 
> yeilds this:
> 
>   <?xml version="1.0" encoding="UTF-8" ?>
> - <response>
> - <lstname="responseHeader">
>   <intname="status">0</int>
>   <intname="QTime">16</int>
> - <lstname="params">
>   <strname="q">*.*</str>
>   </lst>
>   </lst>
> - <resultname="response"numFound="1"start="0">
> - <doc>
>   <strname="author">doc</str>
> - <arrname="content_type">
>   <str>application/pdf</str>
>   </arr>
>   <strname="id">pdfy</str>
>   <datename="last_modified">2011-05-20T02:08:48Z</date>
> - <arrname="title">
>   <str>dmvpndeploy.pdf</str>
>   </arr>
>   </doc>
>   </result>
>   </response>
> 
> 
> From: Jan Høydahl <ja...@cominvent.com>
> To: solr-user@lucene.apache.org; Michael Dockery
> <do...@yahoo.com>
> Sent: Monday, September 12, 2011 4:59 AM
> Subject: Re: select query does not find indexed pdf document
> 
> Hi,
> 
> What do you get from a query http://www/SearchApp/select/?q=*:* or
> http://www/SearchApp/select/?q=id:vpn ?
> You may not have mapped the fields correctly to your schema?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 
> On 12. sep. 2011, at 02:12, Michael Dockery wrote:
> 
> > I am new to solr.
> >
> > I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> >
> > curl
> "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.co
> ntentType=application/pdf&literal.id=pdfy&commit=true"
> >
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <response>
> > <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">860</int></lst>
> > </response>
> >
> >
> > but
> >
> > http://www/SearchApp/select/?q=vpn
> >
> >
> > does not find the document
> >
> >
> > <response>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">0</int>
> > <lst name="params">
> > <str name="q">vpn</str>
> > </lst>
> > </lst>
> > <result name="response" numFound="0" start="0"/>
> > </response>
> >
> >
> > help is appreciated.
> >
> > =================================================
> > fyi
> > I point my test webapp to the index/solr home via mod meta-
> data/context.xml
> > <Context crossContext="true" >
> >    <Environment name="solr/home" type="java.lang.String"
> >  value="c:/solr_home" override="true" />
> >
> > and I had to copy all these jars to my webapp lib dir: (to avoid the
> classnotfound)
> > Solr_download\contrib\extraction\lib
> >  ...in the future i plan to put them in the tomcat/lib dir.
> >
> >
> > Also, I have not modified conf\solrconfig.xml or schema.xml.

Re: select query does not find indexed pdf document

Posted by Michael Dockery <do...@yahoo.com>.

http://www/SearchApp/select/?q=id:vpn 
 
yeilds this: 
  <?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lstname="responseHeader">
  <intname="status">0</int> 
  <intname="QTime">15</int> 
- <lstname="params">
  <strname="q">id:vpn</str> 
  </lst>
  </lst>
  <result name="response"numFound="0"start="0"/> 
  </response>
 
 
*****************************************
 
 http://www/SearchApp/select/?q=*:* 

yeilds this:
 
  <?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lstname="responseHeader">
  <intname="status">0</int> 
  <intname="QTime">16</int> 
- <lstname="params">
  <strname="q">*.*</str> 
  </lst>
  </lst>
- <resultname="response"numFound="1"start="0">
- <doc>
  <strname="author">doc</str> 
- <arrname="content_type">
  <str>application/pdf</str> 
  </arr>
  <strname="id">pdfy</str> 
  <datename="last_modified">2011-05-20T02:08:48Z</date> 
- <arrname="title">
  <str>dmvpndeploy.pdf</str> 
  </arr>
  </doc>
  </result>
  </response>
 

From: Jan Høydahl <ja...@cominvent.com>
To: solr-user@lucene.apache.org; Michael Dockery <do...@yahoo.com>
Sent: Monday, September 12, 2011 4:59 AM
Subject: Re: select query does not find indexed pdf document

Hi,

What do you get from a query http://www/SearchApp/select/?q=*:* or http://www/SearchApp/select/?q=id:vpn ?
You may not have mapped the fields correctly to your schema?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 02:12, Michael Dockery wrote:

> I am new to solr.  
> 
> I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> 
> curl "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdfy&commit=true"
> 
> 
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int name="QTime">860</int></lst>
> </response>
> 
> 
> but
> 
> http://www/SearchApp/select/?q=vpn
> 
> 
> does not find the document
> 
> 
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> <lst name="params">
> <str name="q">vpn</str>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0"/>
> </response>
> 
> 
> help is appreciated.
> 
> =================================================
> fyi
> I point my test webapp to the index/solr home via mod meta-data/context.xml
> <Context crossContext="true" >
>    <Environment name="solr/home" type="java.lang.String" 
>  value="c:/solr_home" override="true" />
> 
> and I had to copy all these jars to my webapp lib dir: (to avoid the classnotfound)
> Solr_download\contrib\extraction\lib
>  ...in the future i plan to put them in the tomcat/lib dir.
> 
> 
> Also, I have not modified conf\solrconfig.xml or schema.xml.

Re: select query does not find indexed pdf document

Posted by Jan Høydahl <ja...@cominvent.com>.

Hi,

What do you get from a query http://www/SearchApp/select/?q=*:* or http://www/SearchApp/select/?q=id:vpn ?
You may not have mapped the fields correctly to your schema?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. sep. 2011, at 02:12, Michael Dockery wrote:

> I am new to solr.  
> 
> I tried to upload a pdf file via curl to my solr webapp (on tomcat)
> 
> curl "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdf&commit=true"
> 
> 
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int name="QTime">860</int></lst>
> </response>
> 
> 
> but
> 
> http://www/SearchApp/select/?q=vpn
> 
> 
> does not find the document
> 
> 
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> <lst name="params">
> <str name="q">vpn</str>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0"/>
> </response>
> 
> 
> help is appreciated.
> 
> =================================================
> fyi
> I point my test webapp to the index/solr home via mod meta-data/context.xml
> <Context crossContext="true" >
>    <Environment name="solr/home" type="java.lang.String" 
>   value="c:/solr_home" override="true" />
> 
> and I had to copy all these jars to my webapp lib dir: (to avoid the classnotfound)
> Solr_download\contrib\extraction\lib
>   ...in the future i plan to put them in the tomcat/lib dir.
> 
> 
> Also, I have not modified conf\solrconfig.xml or schema.xml.