You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by javaxmlsoapdev <vi...@yahoo.com> on 2009/11/23 23:04:46 UTC

ExternalRequestHandler and ContentStreamUpdateRequest usage

Following code is from my test case where it tries to index a file (of type
.txt)
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978"); //key is the uniqueId
up.setParam("ext.literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
server.request(up);		

test case doesn't give me any error and "I think" its indexing the file? but
when I search for a text (which was part of the .txt file) search doesn't
return me anything.

Following is the config from solrconfig.xml where I have mapped content to
"description" field(default search field) in the schema.

<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
    <lst name="defaults">
      <str name="map.content">description</str>
      <str name="defaultField">description</str>
    </lst>
  </requestHandler>

Clearly it seems I am missing something. Any idea?

Thanks,
-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26486817.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by javaxmlsoapdev <vi...@yahoo.com>.

Grant, can you assist. I am going clueless as to why its not indexing content
of the file. I have provided schema, code info below/previous threads. do I
need to explicitly add param("content", "') into ContentStreamUpdateRequest?
which I don't think is the right thing to do. Please advie.

let me know if you need anything else. Appreciate your help.

Thanks,

javaxmlsoapdev wrote:
> 
> Following is luke response. <lst name="fields" /> is empty. can someone
> assist to find out why file content isn't being index?
> 
>   <?xml version="1.0" encoding="UTF-8" ?> 
>  <response>
>  <lst name="responseHeader">
>   <int name="status">0</int> 
>   <int name="QTime">0</int> 
>   </lst>
>  <lst name="index">
>   <int name="numDocs">0</int> 
>   <int name="maxDoc">0</int> 
>   <int name="numTerms">0</int> 
>   <long name="version">1259085661332</long> 
>   <bool name="optimized">false</bool> 
>   <bool name="current">true</bool> 
>   <bool name="hasDeletions">false</bool> 
>   <str
> name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str> 
>   <date name="lastModified">2009-11-24T18:01:01Z</date> 
>   </lst>
>   <lst name="fields" /> 
>  <lst name="info">
>  <lst name="key">
>   <str name="I">Indexed</str> 
>   <str name="T">Tokenized</str> 
>   <str name="S">Stored</str> 
>   <str name="M">Multivalued</str> 
>   <str name="V">TermVector Stored</str> 
>   <str name="o">Store Offset With TermVector</str> 
>   <str name="p">Store Position With TermVector</str> 
>   <str name="O">Omit Norms</str> 
>   <str name="L">Lazy</str> 
>   <str name="B">Binary</str> 
>   <str name="C">Compressed</str> 
>   <str name="f">Sort Missing First</str> 
>   <str name="l">Sort Missing Last</str> 
>   </lst>
>   <str name="NOTE">Document Frequency (df) is not updated when a document
> is marked for deletion. df values include deleted documents.</str> 
>   </lst>
>   </response>
> 
> javaxmlsoapdev wrote:
>> 
>> I was able to configure /docs index separately from my db data index.
>> 
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new
>> schema)
>> 
>> below are the only two fields I have in schema.xml
>> <field name="key" type="slong" indexed="true" stored="true"
>> required="true" /> 
>> <field name="content" type="text" indexed="true" stored="true"
>> multiValued="true"/>	 
>> 
>> Following is updated code from test case
>> 
>> File fileToIndex = new File("file.txt");
>> 
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>> 			
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>> 
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - <result name="response" numFound="1" start="0">
>> - <doc>
>> - <arr name="content">
>>   <str>702</str> 
>>   <str>text/plain</str> 
>>   <str>doc123.txt</str> 
>>   <str /> 
>>   </arr>
>>   <long name="key">8978</long> 
>>   </doc>
>>   </result>
>> 
>> Any idea?
>> 
>> Thanks,
>> 
>> 
>> javaxmlsoapdev wrote:
>>> 
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>> 
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index
>>> separately from database entity index. what's the best way to do this?
>>> If I don't have any db columns etc to map and file documents index
>>> should leave separate from db entity index, what's the best way to
>>> achieve this.
>>> 
>>> thanks,
>>> 
>>> 
>>> 
>>> Grant Ingersoll-6 wrote:
>>>> 
>>>> 
>>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>>> 
>>>>> 
>>>>> *:* returns me 1 count but when I search for specific word (which was
>>>>> part of
>>>>> .txt file I indexed before) it doesn't return me anything. I don't
>>>>> have luke
>>>>> setup on my end.
>>>> 
>>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>>> 
>>>> 
>>>>> let me see if I can set that up quickly but otherwise do
>>>>> you see anything I am missing in solrconfig mapping or something?
>>>> 
>>>> What's your schema look like and how are you querying?
>>>> 
>>>>> which maps
>>>>> document "content" to wrong attribute?
>>>>> 
>>>>> thanks,
>>>>> 
>>>>> Grant Ingersoll-6 wrote:
>>>>>> 
>>>>>> 
>>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Following code is from my test case where it tries to index a file
>>>>>>> (of
>>>>>>> type
>>>>>>> .txt)
>>>>>>> ContentStreamUpdateRequest up = new
>>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>>> up.addFile(fileToIndex);
>>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
>>>>>>> server.request(up);		
>>>>>>> 
>>>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>>>> file?
>>>>>>> but
>>>>>>> when I search for a text (which was part of the .txt file) search
>>>>>>> doesn't
>>>>>>> return me anything.
>>>>>> 
>>>>>> What do your logs show?  Else, what does Luke show or doing a *:*
>>>>>> query
>>>>>> (assuming this is the only file you added)?
>>>>>> 
>>>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>>> 
>>>>>>> 
>>>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>>>> content
>>>>>>> to
>>>>>>> "description" field(default search field) in the schema.
>>>>>>> 
>>>>>>> <requestHandler name="/update/extract"
>>>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>>>   <lst name="defaults">
>>>>>>>     <str name="map.content">description</str>
>>>>>>>     <str name="defaultField">description</str>
>>>>>>>   </lst>
>>>>>>> </requestHandler>
>>>>>>> 
>>>>>>> Clearly it seems I am missing something. Any idea?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>> 
>>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>>> using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> View this message in context:
>>>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>> 
>>>> 
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>> 
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26513001.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by Lance Norskog <go...@gmail.com>.

If you are using multicore, you have to run Luke on a particular core:

http://machine:port/solr/core/admin/luke

And, admin itself:

http://machine:port/solr/core/admin

On Tue, Nov 24, 2009 at 10:18 AM, javaxmlsoapdev <vi...@yahoo.com> wrote:
>
> Following is luke response. <lst name="fields" /> is empty. can someone
> assist to find out why file content isn't being index?
>
>  <?xml version="1.0" encoding="UTF-8" ?>
>  <response>
>  <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">0</int>
>  </lst>
>  <lst name="index">
>  <int name="numDocs">0</int>
>  <int name="maxDoc">0</int>
>  <int name="numTerms">0</int>
>  <long name="version">1259085661332</long>
>  <bool name="optimized">false</bool>
>  <bool name="current">true</bool>
>  <bool name="hasDeletions">false</bool>
>  <str
> name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str>
>  <date name="lastModified">2009-11-24T18:01:01Z</date>
>  </lst>
>  <lst name="fields" />
>  <lst name="info">
>  <lst name="key">
>  <str name="I">Indexed</str>
>  <str name="T">Tokenized</str>
>  <str name="S">Stored</str>
>  <str name="M">Multivalued</str>
>  <str name="V">TermVector Stored</str>
>  <str name="o">Store Offset With TermVector</str>
>  <str name="p">Store Position With TermVector</str>
>  <str name="O">Omit Norms</str>
>  <str name="L">Lazy</str>
>  <str name="B">Binary</str>
>  <str name="C">Compressed</str>
>  <str name="f">Sort Missing First</str>
>  <str name="l">Sort Missing Last</str>
>  </lst>
>  <str name="NOTE">Document Frequency (df) is not updated when a document is
> marked for deletion. df values include deleted documents.</str>
>  </lst>
>  </response>
>
> javaxmlsoapdev wrote:
>>
>> I was able to configure /docs index separately from my db data index.
>>
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new schema)
>>
>> below are the only two fields I have in schema.xml
>> <field name="key" type="slong" indexed="true" stored="true"
>> required="true" />
>> <field name="content" type="text" indexed="true" stored="true"
>> multiValued="true"/>
>>
>> Following is updated code from test case
>>
>> File fileToIndex = new File("file.txt");
>>
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>>
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>>
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - <result name="response" numFound="1" start="0">
>> - <doc>
>> - <arr name="content">
>>   <str>702</str>
>>   <str>text/plain</str>
>>   <str>doc123.txt</str>
>>   <str />
>>   </arr>
>>   <long name="key">8978</long>
>>   </doc>
>>   </result>
>>
>> Any idea?
>>
>> Thanks,
>>
>>
>> javaxmlsoapdev wrote:
>>>
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>>
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index separately
>>> from database entity index. what's the best way to do this? If I don't
>>> have any db columns etc to map and file documents index should leave
>>> separate from db entity index, what's the best way to achieve this.
>>>
>>> thanks,
>>>
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>>
>>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>>>
>>>>>
>>>>> *:* returns me 1 count but when I search for specific word (which was
>>>>> part of
>>>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>>>> luke
>>>>> setup on my end.
>>>>
>>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>>>
>>>>
>>>>> let me see if I can set that up quickly but otherwise do
>>>>> you see anything I am missing in solrconfig mapping or something?
>>>>
>>>> What's your schema look like and how are you querying?
>>>>
>>>>> which maps
>>>>> document "content" to wrong attribute?
>>>>>
>>>>> thanks,
>>>>>
>>>>> Grant Ingersoll-6 wrote:
>>>>>>
>>>>>>
>>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>>>
>>>>>>>
>>>>>>> Following code is from my test case where it tries to index a file
>>>>>>> (of
>>>>>>> type
>>>>>>> .txt)
>>>>>>> ContentStreamUpdateRequest up = new
>>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>>> up.addFile(fileToIndex);
>>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>>>>>> server.request(up);
>>>>>>>
>>>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>>>> file?
>>>>>>> but
>>>>>>> when I search for a text (which was part of the .txt file) search
>>>>>>> doesn't
>>>>>>> return me anything.
>>>>>>
>>>>>> What do your logs show?  Else, what does Luke show or doing a *:*
>>>>>> query
>>>>>> (assuming this is the only file you added)?
>>>>>>
>>>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>>>
>>>>>>>
>>>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>>>> content
>>>>>>> to
>>>>>>> "description" field(default search field) in the schema.
>>>>>>>
>>>>>>> <requestHandler name="/update/extract"
>>>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>>>   <lst name="defaults">
>>>>>>>     <str name="map.content">description</str>
>>>>>>>     <str name="defaultField">description</str>
>>>>>>>   </lst>
>>>>>>> </requestHandler>
>>>>>>>
>>>>>>> Clearly it seems I am missing something. Any idea?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>>
>>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>>> using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26499908.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by javaxmlsoapdev <vi...@yahoo.com>.

Following is luke response. <lst name="fields" /> is empty. can someone
assist to find out why file content isn't being index?

  <?xml version="1.0" encoding="UTF-8" ?> 
 <response>
 <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">0</int> 
  </lst>
 <lst name="index">
  <int name="numDocs">0</int> 
  <int name="maxDoc">0</int> 
  <int name="numTerms">0</int> 
  <long name="version">1259085661332</long> 
  <bool name="optimized">false</bool> 
  <bool name="current">true</bool> 
  <bool name="hasDeletions">false</bool> 
  <str
name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str> 
  <date name="lastModified">2009-11-24T18:01:01Z</date> 
  </lst>
  <lst name="fields" /> 
 <lst name="info">
 <lst name="key">
  <str name="I">Indexed</str> 
  <str name="T">Tokenized</str> 
  <str name="S">Stored</str> 
  <str name="M">Multivalued</str> 
  <str name="V">TermVector Stored</str> 
  <str name="o">Store Offset With TermVector</str> 
  <str name="p">Store Position With TermVector</str> 
  <str name="O">Omit Norms</str> 
  <str name="L">Lazy</str> 
  <str name="B">Binary</str> 
  <str name="C">Compressed</str> 
  <str name="f">Sort Missing First</str> 
  <str name="l">Sort Missing Last</str> 
  </lst>
  <str name="NOTE">Document Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents.</str> 
  </lst>
  </response>

javaxmlsoapdev wrote:
> 
> I was able to configure /docs index separately from my db data index.
> 
> still I am seeing same behavior where it only puts .docName & its size in
> the "content" field (I have renamed field to "content" in this new schema)
> 
> below are the only two fields I have in schema.xml
> <field name="key" type="slong" indexed="true" stored="true"
> required="true" /> 
> <field name="content" type="text" indexed="true" stored="true"
> multiValued="true"/>	 
> 
> Following is updated code from test case
> 
> File fileToIndex = new File("file.txt");
> 
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978");
> up.setParam("literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> NamedList list = server.request(up);
> assertNotNull("Couldn't upload .txt",list);
> 			
> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
> assertEquals( 1, rsp.getResults().getNumFound() );
> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
> 
> Also from solr admin UI when I search for "doc123.txt" then only it
> returns me following response. not sure why its not indexing file's
> content into "content" attribute.
> - <result name="response" numFound="1" start="0">
> - <doc>
> - <arr name="content">
>   <str>702</str> 
>   <str>text/plain</str> 
>   <str>doc123.txt</str> 
>   <str /> 
>   </arr>
>   <long name="key">8978</long> 
>   </doc>
>   </result>
> 
> Any idea?
> 
> Thanks,
> 
> 
> javaxmlsoapdev wrote:
>> 
>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>> its not able to find luke.
>> 
>> I am reusing schema, which is used for indexing other entity from
>> database, which has no relevance to documents. that was my next question
>> that what do I put in, in a schema if my documents don't need any column
>> mappings or anything. plus I want to keep file documents index separately
>> from database entity index. what's the best way to do this? If I don't
>> have any db columns etc to map and file documents index should leave
>> separate from db entity index, what's the best way to achieve this.
>> 
>> thanks,
>> 
>> 
>> 
>> Grant Ingersoll-6 wrote:
>>> 
>>> 
>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>> 
>>>> 
>>>> *:* returns me 1 count but when I search for specific word (which was
>>>> part of
>>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>>> luke
>>>> setup on my end.
>>> 
>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>> 
>>> 
>>>> let me see if I can set that up quickly but otherwise do
>>>> you see anything I am missing in solrconfig mapping or something?
>>> 
>>> What's your schema look like and how are you querying?
>>> 
>>>> which maps
>>>> document "content" to wrong attribute?
>>>> 
>>>> thanks,
>>>> 
>>>> Grant Ingersoll-6 wrote:
>>>>> 
>>>>> 
>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>> 
>>>>>> 
>>>>>> Following code is from my test case where it tries to index a file
>>>>>> (of
>>>>>> type
>>>>>> .txt)
>>>>>> ContentStreamUpdateRequest up = new
>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>> up.addFile(fileToIndex);
>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
>>>>>> server.request(up);		
>>>>>> 
>>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>>> file?
>>>>>> but
>>>>>> when I search for a text (which was part of the .txt file) search
>>>>>> doesn't
>>>>>> return me anything.
>>>>> 
>>>>> What do your logs show?  Else, what does Luke show or doing a *:*
>>>>> query
>>>>> (assuming this is the only file you added)?
>>>>> 
>>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>> 
>>>>>> 
>>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>>> content
>>>>>> to
>>>>>> "description" field(default search field) in the schema.
>>>>>> 
>>>>>> <requestHandler name="/update/extract"
>>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>>   <lst name="defaults">
>>>>>>     <str name="map.content">description</str>
>>>>>>     <str name="defaultField">description</str>
>>>>>>   </lst>
>>>>>> </requestHandler>
>>>>>> 
>>>>>> Clearly it seems I am missing something. Any idea?
>>>>> 
>>>>> 
>>>>> 
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com/
>>>>> 
>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>> using
>>>>> Solr/Lucene:
>>>>> http://www.lucidimagination.com/search
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> View this message in context:
>>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> 
>>> 
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>> 
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26499908.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by javaxmlsoapdev <vi...@yahoo.com>.

I was able to configure /docs index separately from my db data index.

still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema)

below are the only two fields I have in schema.xml
<field name="key" type="slong" indexed="true" stored="true" required="true"
/> 
<field name="content" type="text" indexed="true" stored="true"
multiValued="true"/>	 

Following is updated code from test case

File fileToIndex = new File("file.txt");

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978");
up.setParam("literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList list = server.request(up);
assertNotNull("Couldn't upload .txt",list);
			
QueryResponse rsp = server.query( new SolrQuery( "*:*") );
assertEquals( 1, rsp.getResults().getNumFound() );
System.out.println(rsp.getResults().get(0).getFieldValue("content"));

Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute.
- <result name="response" numFound="1" start="0">
- <doc>
- <arr name="content">
  <str>702</str> 
  <str>text/plain</str> 
  <str>doc123.txt</str> 
  <str /> 
  </arr>
  <long name="key">8978</long> 
  </doc>
  </result>

Any idea?

Thanks,


javaxmlsoapdev wrote:
> 
> http://machinename:port/solr/admin/luke gives me 404 error so seems like
> its not able to find luke.
> 
> I am reusing schema, which is used for indexing other entity from
> database, which has no relevance to documents. that was my next question
> that what do I put in, in a schema if my documents don't need any column
> mappings or anything. plus I want to keep file documents index separately
> from database entity index. what's the best way to do this? If I don't
> have any db columns etc to map and file documents index should leave
> separate from db entity index, what's the best way to achieve this.
> 
> thanks,
> 
> 
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> *:* returns me 1 count but when I search for specific word (which was
>>> part of
>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>> luke
>>> setup on my end.
>> 
>> http://localhost:8983/solr/admin/luke should give yo some info.
>> 
>> 
>>> let me see if I can set that up quickly but otherwise do
>>> you see anything I am missing in solrconfig mapping or something?
>> 
>> What's your schema look like and how are you querying?
>> 
>>> which maps
>>> document "content" to wrong attribute?
>>> 
>>> thanks,
>>> 
>>> Grant Ingersoll-6 wrote:
>>>> 
>>>> 
>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>> 
>>>>> 
>>>>> Following code is from my test case where it tries to index a file (of
>>>>> type
>>>>> .txt)
>>>>> ContentStreamUpdateRequest up = new
>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>> up.addFile(fileToIndex);
>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
>>>>> server.request(up);		
>>>>> 
>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>> file?
>>>>> but
>>>>> when I search for a text (which was part of the .txt file) search
>>>>> doesn't
>>>>> return me anything.
>>>> 
>>>> What do your logs show?  Else, what does Luke show or doing a *:* query
>>>> (assuming this is the only file you added)?
>>>> 
>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>> 
>>>>> 
>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>> content
>>>>> to
>>>>> "description" field(default search field) in the schema.
>>>>> 
>>>>> <requestHandler name="/update/extract"
>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>   <lst name="defaults">
>>>>>     <str name="map.content">description</str>
>>>>>     <str name="defaultField">description</str>
>>>>>   </lst>
>>>>> </requestHandler>
>>>>> 
>>>>> Clearly it seems I am missing something. Any idea?
>>>> 
>>>> 
>>>> 
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>> 
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498552.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by javaxmlsoapdev <vi...@yahoo.com>.

I was able to configure /docs index separately from my db data index. 

still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema) 

below are the only two fields I have in schema.xml 
<field name="key" type="slong" indexed="true" stored="true" required="true"
/> 
<field name="content" type="text" indexed="true" stored="true"
multiValued="true"/> 

Following is updated code from test case 

File fileToIndex = new File("file.txt"); 

ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract"); 
up.addFile(fileToIndex); 
up.setParam("literal.key", "8978"); 
up.setParam("literal.docName", "doc123.txt"); 
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 
NamedList list = server.request(up); 
assertNotNull("Couldn't upload .txt",list); 
                        
QueryResponse rsp = server.query( new SolrQuery( "*:*") ); 
assertEquals( 1, rsp.getResults().getNumFound() ); 
System.out.println(rsp.getResults().get(0).getFieldValue("content")); 

Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute. 
 <result name="response" numFound="1" start="0"> 
 <doc> 
 <arr name="content"> 
  <str>702</str> 
  <str>text/plain</str> 
  <str>doc123.txt</str> 
  <str /> 
  </arr> 
  <long name="key">8978</long> 
  </doc> 
  </result> 

Any idea? 

Thanks, 

-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498946.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by javaxmlsoapdev <vi...@yahoo.com>.

http://machinename:port/solr/admin/luke gives me 404 error so seems like its
not able to find luke.

I am reusing schema, which is used for indexing other entity from database,
which has no relevance to documents. that was my next question that what do
I put in, in a schema if my documents don't need any column mappings or
anything. plus I want to keep file documents index separately from database
entity index. what's the best way to do this? If I don't have any db columns
etc to map and file documents index should leave separate from db entity
index, what's the best way to achieve this.

thanks,



Grant Ingersoll-6 wrote:
> 
> 
> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
> 
>> 
>> *:* returns me 1 count but when I search for specific word (which was
>> part of
>> .txt file I indexed before) it doesn't return me anything. I don't have
>> luke
>> setup on my end.
> 
> http://localhost:8983/solr/admin/luke should give yo some info.
> 
> 
>> let me see if I can set that up quickly but otherwise do
>> you see anything I am missing in solrconfig mapping or something?
> 
> What's your schema look like and how are you querying?
> 
>> which maps
>> document "content" to wrong attribute?
>> 
>> thanks,
>> 
>> Grant Ingersoll-6 wrote:
>>> 
>>> 
>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>> 
>>>> 
>>>> Following code is from my test case where it tries to index a file (of
>>>> type
>>>> .txt)
>>>> ContentStreamUpdateRequest up = new
>>>> ContentStreamUpdateRequest("/update/extract");
>>>> up.addFile(fileToIndex);
>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
>>>> server.request(up);		
>>>> 
>>>> test case doesn't give me any error and "I think" its indexing the
>>>> file?
>>>> but
>>>> when I search for a text (which was part of the .txt file) search
>>>> doesn't
>>>> return me anything.
>>> 
>>> What do your logs show?  Else, what does Luke show or doing a *:* query
>>> (assuming this is the only file you added)?
>>> 
>>> Also, I don't think you need ext.literal anymore, just literal.
>>> 
>>>> 
>>>> Following is the config from solrconfig.xml where I have mapped content
>>>> to
>>>> "description" field(default search field) in the schema.
>>>> 
>>>> <requestHandler name="/update/extract"
>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>   <lst name="defaults">
>>>>     <str name="map.content">description</str>
>>>>     <str name="defaultField">description</str>
>>>>   </lst>
>>>> </requestHandler>
>>>> 
>>>> Clearly it seems I am missing something. Any idea?
>>> 
>>> 
>>> 
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>> 
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>> 
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26497295.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by Grant Ingersoll <gs...@apache.org>.

On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:

> 
> *:* returns me 1 count but when I search for specific word (which was part of
> .txt file I indexed before) it doesn't return me anything. I don't have luke
> setup on my end.

http://localhost:8983/solr/admin/luke should give yo some info.


> let me see if I can set that up quickly but otherwise do
> you see anything I am missing in solrconfig mapping or something?

What's your schema look like and how are you querying?

> which maps
> document "content" to wrong attribute?
> 
> thanks,
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> Following code is from my test case where it tries to index a file (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
>>> server.request(up);		
>>> 
>>> test case doesn't give me any error and "I think" its indexing the file?
>>> but
>>> when I search for a text (which was part of the .txt file) search doesn't
>>> return me anything.
>> 
>> What do your logs show?  Else, what does Luke show or doing a *:* query
>> (assuming this is the only file you added)?
>> 
>> Also, I don't think you need ext.literal anymore, just literal.
>> 
>>> 
>>> Following is the config from solrconfig.xml where I have mapped content
>>> to
>>> "description" field(default search field) in the schema.
>>> 
>>> <requestHandler name="/update/extract"
>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>   <lst name="defaults">
>>>     <str name="map.content">description</str>
>>>     <str name="defaultField">description</str>
>>>   </lst>
>>> </requestHandler>
>>> 
>>> Clearly it seems I am missing something. Any idea?
>> 
>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> -- 
> View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by javaxmlsoapdev <vi...@yahoo.com>.

FYI: weirdly its returning me following when I run
rsp.getResults().get(0).getFieldValue("description")

[702, text/plain, doc123.txt,         ]

so it seems like its storing 

up.setParam("ext.literal.docName", "doc123.txt"); into description versus
file content in "description" attribute.

Any idea?

Thanks,

javaxmlsoapdev wrote:
> 
> *:* returns me 1 count but when I search for specific word (which was part
> of .txt file I indexed before) it doesn't return me anything. I don't have
> luke setup on my end. let me see if I can set that up quickly but
> otherwise do you see anything I am missing in solrconfig mapping or
> something? which maps document "content" to wrong attribute?
> 
> thanks,
> 
> Grant Ingersoll-6 wrote:
>> 
>> 
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>> 
>>> 
>>> Following code is from my test case where it tries to index a file (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
>>> server.request(up);		
>>> 
>>> test case doesn't give me any error and "I think" its indexing the file?
>>> but
>>> when I search for a text (which was part of the .txt file) search
>>> doesn't
>>> return me anything.
>> 
>> What do your logs show?  Else, what does Luke show or doing a *:* query
>> (assuming this is the only file you added)?
>> 
>> Also, I don't think you need ext.literal anymore, just literal.
>> 
>>> 
>>> Following is the config from solrconfig.xml where I have mapped content
>>> to
>>> "description" field(default search field) in the schema.
>>> 
>>> <requestHandler name="/update/extract"
>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>    <lst name="defaults">
>>>      <str name="map.content">description</str>
>>>      <str name="defaultField">description</str>
>>>    </lst>
>>>  </requestHandler>
>>> 
>>> Clearly it seems I am missing something. Any idea?
>> 
>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487409.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by javaxmlsoapdev <vi...@yahoo.com>.

*:* returns me 1 count but when I search for specific word (which was part of
.txt file I indexed before) it doesn't return me anything. I don't have luke
setup on my end. let me see if I can set that up quickly but otherwise do
you see anything I am missing in solrconfig mapping or something? which maps
document "content" to wrong attribute?

thanks,

Grant Ingersoll-6 wrote:
> 
> 
> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
> 
>> 
>> Following code is from my test case where it tries to index a file (of
>> type
>> .txt)
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978"); //key is the uniqueId
>> up.setParam("ext.literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
>> server.request(up);		
>> 
>> test case doesn't give me any error and "I think" its indexing the file?
>> but
>> when I search for a text (which was part of the .txt file) search doesn't
>> return me anything.
> 
> What do your logs show?  Else, what does Luke show or doing a *:* query
> (assuming this is the only file you added)?
> 
> Also, I don't think you need ext.literal anymore, just literal.
> 
>> 
>> Following is the config from solrconfig.xml where I have mapped content
>> to
>> "description" field(default search field) in the schema.
>> 
>> <requestHandler name="/update/extract"
>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>    <lst name="defaults">
>>      <str name="map.content">description</str>
>>      <str name="defaultField">description</str>
>>    </lst>
>>  </requestHandler>
>> 
>> Clearly it seems I am missing something. Any idea?
> 
> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

Posted by Grant Ingersoll <gs...@apache.org>.

On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:

> 
> Following code is from my test case where it tries to index a file (of type
> .txt)
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978"); //key is the uniqueId
> up.setParam("ext.literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);	
> server.request(up);		
> 
> test case doesn't give me any error and "I think" its indexing the file? but
> when I search for a text (which was part of the .txt file) search doesn't
> return me anything.

What do your logs show?  Else, what does Luke show or doing a *:* query (assuming this is the only file you added)?

Also, I don't think you need ext.literal anymore, just literal.

> 
> Following is the config from solrconfig.xml where I have mapped content to
> "description" field(default search field) in the schema.
> 
> <requestHandler name="/update/extract"
> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>    <lst name="defaults">
>      <str name="map.content">description</str>
>      <str name="defaultField">description</str>
>    </lst>
>  </requestHandler>
> 
> Clearly it seems I am missing something. Any idea?



--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search