You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by javaxmlsoapdev <vi...@yahoo.com> on 2009/11/23 23:04:46 UTC
ExternalRequestHandler and ContentStreamUpdateRequest usage
Following code is from my test case where it tries to index a file (of type
.txt)
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978"); //key is the uniqueId
up.setParam("ext.literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);
test case doesn't give me any error and "I think" its indexing the file? but
when I search for a text (which was part of the .txt file) search doesn't
return me anything.
Following is the config from solrconfig.xml where I have mapped content to
"description" field(default search field) in the schema.
<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="map.content">description</str>
<str name="defaultField">description</str>
</lst>
</requestHandler>
Clearly it seems I am missing something. Any idea?
Thanks,
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26486817.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by javaxmlsoapdev <vi...@yahoo.com>.
Grant, can you assist. I am going clueless as to why its not indexing content
of the file. I have provided schema, code info below/previous threads. do I
need to explicitly add param("content", "') into ContentStreamUpdateRequest?
which I don't think is the right thing to do. Please advie.
let me know if you need anything else. Appreciate your help.
Thanks,
javaxmlsoapdev wrote:
>
> Following is luke response. <lst name="fields" /> is empty. can someone
> assist to find out why file content isn't being index?
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> </lst>
> <lst name="index">
> <int name="numDocs">0</int>
> <int name="maxDoc">0</int>
> <int name="numTerms">0</int>
> <long name="version">1259085661332</long>
> <bool name="optimized">false</bool>
> <bool name="current">true</bool>
> <bool name="hasDeletions">false</bool>
> <str
> name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str>
> <date name="lastModified">2009-11-24T18:01:01Z</date>
> </lst>
> <lst name="fields" />
> <lst name="info">
> <lst name="key">
> <str name="I">Indexed</str>
> <str name="T">Tokenized</str>
> <str name="S">Stored</str>
> <str name="M">Multivalued</str>
> <str name="V">TermVector Stored</str>
> <str name="o">Store Offset With TermVector</str>
> <str name="p">Store Position With TermVector</str>
> <str name="O">Omit Norms</str>
> <str name="L">Lazy</str>
> <str name="B">Binary</str>
> <str name="C">Compressed</str>
> <str name="f">Sort Missing First</str>
> <str name="l">Sort Missing Last</str>
> </lst>
> <str name="NOTE">Document Frequency (df) is not updated when a document
> is marked for deletion. df values include deleted documents.</str>
> </lst>
> </response>
>
> javaxmlsoapdev wrote:
>>
>> I was able to configure /docs index separately from my db data index.
>>
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new
>> schema)
>>
>> below are the only two fields I have in schema.xml
>> <field name="key" type="slong" indexed="true" stored="true"
>> required="true" />
>> <field name="content" type="text" indexed="true" stored="true"
>> multiValued="true"/>
>>
>> Following is updated code from test case
>>
>> File fileToIndex = new File("file.txt");
>>
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>>
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>>
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - <result name="response" numFound="1" start="0">
>> - <doc>
>> - <arr name="content">
>> <str>702</str>
>> <str>text/plain</str>
>> <str>doc123.txt</str>
>> <str />
>> </arr>
>> <long name="key">8978</long>
>> </doc>
>> </result>
>>
>> Any idea?
>>
>> Thanks,
>>
>>
>> javaxmlsoapdev wrote:
>>>
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>>
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index
>>> separately from database entity index. what's the best way to do this?
>>> If I don't have any db columns etc to map and file documents index
>>> should leave separate from db entity index, what's the best way to
>>> achieve this.
>>>
>>> thanks,
>>>
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>>
>>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>>>
>>>>>
>>>>> *:* returns me 1 count but when I search for specific word (which was
>>>>> part of
>>>>> .txt file I indexed before) it doesn't return me anything. I don't
>>>>> have luke
>>>>> setup on my end.
>>>>
>>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>>>
>>>>
>>>>> let me see if I can set that up quickly but otherwise do
>>>>> you see anything I am missing in solrconfig mapping or something?
>>>>
>>>> What's your schema look like and how are you querying?
>>>>
>>>>> which maps
>>>>> document "content" to wrong attribute?
>>>>>
>>>>> thanks,
>>>>>
>>>>> Grant Ingersoll-6 wrote:
>>>>>>
>>>>>>
>>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>>>
>>>>>>>
>>>>>>> Following code is from my test case where it tries to index a file
>>>>>>> (of
>>>>>>> type
>>>>>>> .txt)
>>>>>>> ContentStreamUpdateRequest up = new
>>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>>> up.addFile(fileToIndex);
>>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>>>>>> server.request(up);
>>>>>>>
>>>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>>>> file?
>>>>>>> but
>>>>>>> when I search for a text (which was part of the .txt file) search
>>>>>>> doesn't
>>>>>>> return me anything.
>>>>>>
>>>>>> What do your logs show? Else, what does Luke show or doing a *:*
>>>>>> query
>>>>>> (assuming this is the only file you added)?
>>>>>>
>>>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>>>
>>>>>>>
>>>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>>>> content
>>>>>>> to
>>>>>>> "description" field(default search field) in the schema.
>>>>>>>
>>>>>>> <requestHandler name="/update/extract"
>>>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>>> <lst name="defaults">
>>>>>>> <str name="map.content">description</str>
>>>>>>> <str name="defaultField">description</str>
>>>>>>> </lst>
>>>>>>> </requestHandler>
>>>>>>>
>>>>>>> Clearly it seems I am missing something. Any idea?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>>
>>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>>> using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26513001.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by Lance Norskog <go...@gmail.com>.
If you are using multicore, you have to run Luke on a particular core:
http://machine:port/solr/core/admin/luke
And, admin itself:
http://machine:port/solr/core/admin
On Tue, Nov 24, 2009 at 10:18 AM, javaxmlsoapdev <vi...@yahoo.com> wrote:
>
> Following is luke response. <lst name="fields" /> is empty. can someone
> assist to find out why file content isn't being index?
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> </lst>
> <lst name="index">
> <int name="numDocs">0</int>
> <int name="maxDoc">0</int>
> <int name="numTerms">0</int>
> <long name="version">1259085661332</long>
> <bool name="optimized">false</bool>
> <bool name="current">true</bool>
> <bool name="hasDeletions">false</bool>
> <str
> name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str>
> <date name="lastModified">2009-11-24T18:01:01Z</date>
> </lst>
> <lst name="fields" />
> <lst name="info">
> <lst name="key">
> <str name="I">Indexed</str>
> <str name="T">Tokenized</str>
> <str name="S">Stored</str>
> <str name="M">Multivalued</str>
> <str name="V">TermVector Stored</str>
> <str name="o">Store Offset With TermVector</str>
> <str name="p">Store Position With TermVector</str>
> <str name="O">Omit Norms</str>
> <str name="L">Lazy</str>
> <str name="B">Binary</str>
> <str name="C">Compressed</str>
> <str name="f">Sort Missing First</str>
> <str name="l">Sort Missing Last</str>
> </lst>
> <str name="NOTE">Document Frequency (df) is not updated when a document is
> marked for deletion. df values include deleted documents.</str>
> </lst>
> </response>
>
> javaxmlsoapdev wrote:
>>
>> I was able to configure /docs index separately from my db data index.
>>
>> still I am seeing same behavior where it only puts .docName & its size in
>> the "content" field (I have renamed field to "content" in this new schema)
>>
>> below are the only two fields I have in schema.xml
>> <field name="key" type="slong" indexed="true" stored="true"
>> required="true" />
>> <field name="content" type="text" indexed="true" stored="true"
>> multiValued="true"/>
>>
>> Following is updated code from test case
>>
>> File fileToIndex = new File("file.txt");
>>
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978");
>> up.setParam("literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> NamedList list = server.request(up);
>> assertNotNull("Couldn't upload .txt",list);
>>
>> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>> assertEquals( 1, rsp.getResults().getNumFound() );
>> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>>
>> Also from solr admin UI when I search for "doc123.txt" then only it
>> returns me following response. not sure why its not indexing file's
>> content into "content" attribute.
>> - <result name="response" numFound="1" start="0">
>> - <doc>
>> - <arr name="content">
>> <str>702</str>
>> <str>text/plain</str>
>> <str>doc123.txt</str>
>> <str />
>> </arr>
>> <long name="key">8978</long>
>> </doc>
>> </result>
>>
>> Any idea?
>>
>> Thanks,
>>
>>
>> javaxmlsoapdev wrote:
>>>
>>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>>> its not able to find luke.
>>>
>>> I am reusing schema, which is used for indexing other entity from
>>> database, which has no relevance to documents. that was my next question
>>> that what do I put in, in a schema if my documents don't need any column
>>> mappings or anything. plus I want to keep file documents index separately
>>> from database entity index. what's the best way to do this? If I don't
>>> have any db columns etc to map and file documents index should leave
>>> separate from db entity index, what's the best way to achieve this.
>>>
>>> thanks,
>>>
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>>
>>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>>>
>>>>>
>>>>> *:* returns me 1 count but when I search for specific word (which was
>>>>> part of
>>>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>>>> luke
>>>>> setup on my end.
>>>>
>>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>>>
>>>>
>>>>> let me see if I can set that up quickly but otherwise do
>>>>> you see anything I am missing in solrconfig mapping or something?
>>>>
>>>> What's your schema look like and how are you querying?
>>>>
>>>>> which maps
>>>>> document "content" to wrong attribute?
>>>>>
>>>>> thanks,
>>>>>
>>>>> Grant Ingersoll-6 wrote:
>>>>>>
>>>>>>
>>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>>>
>>>>>>>
>>>>>>> Following code is from my test case where it tries to index a file
>>>>>>> (of
>>>>>>> type
>>>>>>> .txt)
>>>>>>> ContentStreamUpdateRequest up = new
>>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>>> up.addFile(fileToIndex);
>>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>>>>>> server.request(up);
>>>>>>>
>>>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>>>> file?
>>>>>>> but
>>>>>>> when I search for a text (which was part of the .txt file) search
>>>>>>> doesn't
>>>>>>> return me anything.
>>>>>>
>>>>>> What do your logs show? Else, what does Luke show or doing a *:*
>>>>>> query
>>>>>> (assuming this is the only file you added)?
>>>>>>
>>>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>>>
>>>>>>>
>>>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>>>> content
>>>>>>> to
>>>>>>> "description" field(default search field) in the schema.
>>>>>>>
>>>>>>> <requestHandler name="/update/extract"
>>>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>>> <lst name="defaults">
>>>>>>> <str name="map.content">description</str>
>>>>>>> <str name="defaultField">description</str>
>>>>>>> </lst>
>>>>>>> </requestHandler>
>>>>>>>
>>>>>>> Clearly it seems I am missing something. Any idea?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>>
>>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>>> using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26499908.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
--
Lance Norskog
goksron@gmail.com
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by javaxmlsoapdev <vi...@yahoo.com>.
Following is luke response. <lst name="fields" /> is empty. can someone
assist to find out why file content isn't being index?
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="index">
<int name="numDocs">0</int>
<int name="maxDoc">0</int>
<int name="numTerms">0</int>
<long name="version">1259085661332</long>
<bool name="optimized">false</bool>
<bool name="current">true</bool>
<bool name="hasDeletions">false</bool>
<str
name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str>
<date name="lastModified">2009-11-24T18:01:01Z</date>
</lst>
<lst name="fields" />
<lst name="info">
<lst name="key">
<str name="I">Indexed</str>
<str name="T">Tokenized</str>
<str name="S">Stored</str>
<str name="M">Multivalued</str>
<str name="V">TermVector Stored</str>
<str name="o">Store Offset With TermVector</str>
<str name="p">Store Position With TermVector</str>
<str name="O">Omit Norms</str>
<str name="L">Lazy</str>
<str name="B">Binary</str>
<str name="C">Compressed</str>
<str name="f">Sort Missing First</str>
<str name="l">Sort Missing Last</str>
</lst>
<str name="NOTE">Document Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents.</str>
</lst>
</response>
javaxmlsoapdev wrote:
>
> I was able to configure /docs index separately from my db data index.
>
> still I am seeing same behavior where it only puts .docName & its size in
> the "content" field (I have renamed field to "content" in this new schema)
>
> below are the only two fields I have in schema.xml
> <field name="key" type="slong" indexed="true" stored="true"
> required="true" />
> <field name="content" type="text" indexed="true" stored="true"
> multiValued="true"/>
>
> Following is updated code from test case
>
> File fileToIndex = new File("file.txt");
>
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978");
> up.setParam("literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> NamedList list = server.request(up);
> assertNotNull("Couldn't upload .txt",list);
>
> QueryResponse rsp = server.query( new SolrQuery( "*:*") );
> assertEquals( 1, rsp.getResults().getNumFound() );
> System.out.println(rsp.getResults().get(0).getFieldValue("content"));
>
> Also from solr admin UI when I search for "doc123.txt" then only it
> returns me following response. not sure why its not indexing file's
> content into "content" attribute.
> - <result name="response" numFound="1" start="0">
> - <doc>
> - <arr name="content">
> <str>702</str>
> <str>text/plain</str>
> <str>doc123.txt</str>
> <str />
> </arr>
> <long name="key">8978</long>
> </doc>
> </result>
>
> Any idea?
>
> Thanks,
>
>
> javaxmlsoapdev wrote:
>>
>> http://machinename:port/solr/admin/luke gives me 404 error so seems like
>> its not able to find luke.
>>
>> I am reusing schema, which is used for indexing other entity from
>> database, which has no relevance to documents. that was my next question
>> that what do I put in, in a schema if my documents don't need any column
>> mappings or anything. plus I want to keep file documents index separately
>> from database entity index. what's the best way to do this? If I don't
>> have any db columns etc to map and file documents index should leave
>> separate from db entity index, what's the best way to achieve this.
>>
>> thanks,
>>
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>>
>>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>>
>>>>
>>>> *:* returns me 1 count but when I search for specific word (which was
>>>> part of
>>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>>> luke
>>>> setup on my end.
>>>
>>> http://localhost:8983/solr/admin/luke should give yo some info.
>>>
>>>
>>>> let me see if I can set that up quickly but otherwise do
>>>> you see anything I am missing in solrconfig mapping or something?
>>>
>>> What's your schema look like and how are you querying?
>>>
>>>> which maps
>>>> document "content" to wrong attribute?
>>>>
>>>> thanks,
>>>>
>>>> Grant Ingersoll-6 wrote:
>>>>>
>>>>>
>>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>>
>>>>>>
>>>>>> Following code is from my test case where it tries to index a file
>>>>>> (of
>>>>>> type
>>>>>> .txt)
>>>>>> ContentStreamUpdateRequest up = new
>>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>>> up.addFile(fileToIndex);
>>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>>>>> server.request(up);
>>>>>>
>>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>>> file?
>>>>>> but
>>>>>> when I search for a text (which was part of the .txt file) search
>>>>>> doesn't
>>>>>> return me anything.
>>>>>
>>>>> What do your logs show? Else, what does Luke show or doing a *:*
>>>>> query
>>>>> (assuming this is the only file you added)?
>>>>>
>>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>>
>>>>>>
>>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>>> content
>>>>>> to
>>>>>> "description" field(default search field) in the schema.
>>>>>>
>>>>>> <requestHandler name="/update/extract"
>>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>>> <lst name="defaults">
>>>>>> <str name="map.content">description</str>
>>>>>> <str name="defaultField">description</str>
>>>>>> </lst>
>>>>>> </requestHandler>
>>>>>>
>>>>>> Clearly it seems I am missing something. Any idea?
>>>>>
>>>>>
>>>>>
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com/
>>>>>
>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>> using
>>>>> Solr/Lucene:
>>>>> http://www.lucidimagination.com/search
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>
>>
>>
>
>
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26499908.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by javaxmlsoapdev <vi...@yahoo.com>.
I was able to configure /docs index separately from my db data index.
still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema)
below are the only two fields I have in schema.xml
<field name="key" type="slong" indexed="true" stored="true" required="true"
/>
<field name="content" type="text" indexed="true" stored="true"
multiValued="true"/>
Following is updated code from test case
File fileToIndex = new File("file.txt");
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978");
up.setParam("literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList list = server.request(up);
assertNotNull("Couldn't upload .txt",list);
QueryResponse rsp = server.query( new SolrQuery( "*:*") );
assertEquals( 1, rsp.getResults().getNumFound() );
System.out.println(rsp.getResults().get(0).getFieldValue("content"));
Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute.
- <result name="response" numFound="1" start="0">
- <doc>
- <arr name="content">
<str>702</str>
<str>text/plain</str>
<str>doc123.txt</str>
<str />
</arr>
<long name="key">8978</long>
</doc>
</result>
Any idea?
Thanks,
javaxmlsoapdev wrote:
>
> http://machinename:port/solr/admin/luke gives me 404 error so seems like
> its not able to find luke.
>
> I am reusing schema, which is used for indexing other entity from
> database, which has no relevance to documents. that was my next question
> that what do I put in, in a schema if my documents don't need any column
> mappings or anything. plus I want to keep file documents index separately
> from database entity index. what's the best way to do this? If I don't
> have any db columns etc to map and file documents index should leave
> separate from db entity index, what's the best way to achieve this.
>
> thanks,
>
>
>
> Grant Ingersoll-6 wrote:
>>
>>
>> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>>
>>>
>>> *:* returns me 1 count but when I search for specific word (which was
>>> part of
>>> .txt file I indexed before) it doesn't return me anything. I don't have
>>> luke
>>> setup on my end.
>>
>> http://localhost:8983/solr/admin/luke should give yo some info.
>>
>>
>>> let me see if I can set that up quickly but otherwise do
>>> you see anything I am missing in solrconfig mapping or something?
>>
>> What's your schema look like and how are you querying?
>>
>>> which maps
>>> document "content" to wrong attribute?
>>>
>>> thanks,
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>>
>>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>>
>>>>>
>>>>> Following code is from my test case where it tries to index a file (of
>>>>> type
>>>>> .txt)
>>>>> ContentStreamUpdateRequest up = new
>>>>> ContentStreamUpdateRequest("/update/extract");
>>>>> up.addFile(fileToIndex);
>>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>>>> server.request(up);
>>>>>
>>>>> test case doesn't give me any error and "I think" its indexing the
>>>>> file?
>>>>> but
>>>>> when I search for a text (which was part of the .txt file) search
>>>>> doesn't
>>>>> return me anything.
>>>>
>>>> What do your logs show? Else, what does Luke show or doing a *:* query
>>>> (assuming this is the only file you added)?
>>>>
>>>> Also, I don't think you need ext.literal anymore, just literal.
>>>>
>>>>>
>>>>> Following is the config from solrconfig.xml where I have mapped
>>>>> content
>>>>> to
>>>>> "description" field(default search field) in the schema.
>>>>>
>>>>> <requestHandler name="/update/extract"
>>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>>> <lst name="defaults">
>>>>> <str name="map.content">description</str>
>>>>> <str name="defaultField">description</str>
>>>>> </lst>
>>>>> </requestHandler>
>>>>>
>>>>> Clearly it seems I am missing something. Any idea?
>>>>
>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
>
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498552.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by javaxmlsoapdev <vi...@yahoo.com>.
I was able to configure /docs index separately from my db data index.
still I am seeing same behavior where it only puts .docName & its size in
the "content" field (I have renamed field to "content" in this new schema)
below are the only two fields I have in schema.xml
<field name="key" type="slong" indexed="true" stored="true" required="true"
/>
<field name="content" type="text" indexed="true" stored="true"
multiValued="true"/>
Following is updated code from test case
File fileToIndex = new File("file.txt");
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(fileToIndex);
up.setParam("literal.key", "8978");
up.setParam("literal.docName", "doc123.txt");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList list = server.request(up);
assertNotNull("Couldn't upload .txt",list);
QueryResponse rsp = server.query( new SolrQuery( "*:*") );
assertEquals( 1, rsp.getResults().getNumFound() );
System.out.println(rsp.getResults().get(0).getFieldValue("content"));
Also from solr admin UI when I search for "doc123.txt" then only it returns
me following response. not sure why its not indexing file's content into
"content" attribute.
<result name="response" numFound="1" start="0">
<doc>
<arr name="content">
<str>702</str>
<str>text/plain</str>
<str>doc123.txt</str>
<str />
</arr>
<long name="key">8978</long>
</doc>
</result>
Any idea?
Thanks,
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26498946.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by javaxmlsoapdev <vi...@yahoo.com>.
http://machinename:port/solr/admin/luke gives me 404 error so seems like its
not able to find luke.
I am reusing schema, which is used for indexing other entity from database,
which has no relevance to documents. that was my next question that what do
I put in, in a schema if my documents don't need any column mappings or
anything. plus I want to keep file documents index separately from database
entity index. what's the best way to do this? If I don't have any db columns
etc to map and file documents index should leave separate from db entity
index, what's the best way to achieve this.
thanks,
Grant Ingersoll-6 wrote:
>
>
> On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>
>>
>> *:* returns me 1 count but when I search for specific word (which was
>> part of
>> .txt file I indexed before) it doesn't return me anything. I don't have
>> luke
>> setup on my end.
>
> http://localhost:8983/solr/admin/luke should give yo some info.
>
>
>> let me see if I can set that up quickly but otherwise do
>> you see anything I am missing in solrconfig mapping or something?
>
> What's your schema look like and how are you querying?
>
>> which maps
>> document "content" to wrong attribute?
>>
>> thanks,
>>
>> Grant Ingersoll-6 wrote:
>>>
>>>
>>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>>
>>>>
>>>> Following code is from my test case where it tries to index a file (of
>>>> type
>>>> .txt)
>>>> ContentStreamUpdateRequest up = new
>>>> ContentStreamUpdateRequest("/update/extract");
>>>> up.addFile(fileToIndex);
>>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>>> up.setParam("ext.literal.docName", "doc123.txt");
>>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>>> server.request(up);
>>>>
>>>> test case doesn't give me any error and "I think" its indexing the
>>>> file?
>>>> but
>>>> when I search for a text (which was part of the .txt file) search
>>>> doesn't
>>>> return me anything.
>>>
>>> What do your logs show? Else, what does Luke show or doing a *:* query
>>> (assuming this is the only file you added)?
>>>
>>> Also, I don't think you need ext.literal anymore, just literal.
>>>
>>>>
>>>> Following is the config from solrconfig.xml where I have mapped content
>>>> to
>>>> "description" field(default search field) in the schema.
>>>>
>>>> <requestHandler name="/update/extract"
>>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>>> <lst name="defaults">
>>>> <str name="map.content">description</str>
>>>> <str name="defaultField">description</str>
>>>> </lst>
>>>> </requestHandler>
>>>>
>>>> Clearly it seems I am missing something. Any idea?
>>>
>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26497295.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
>
> *:* returns me 1 count but when I search for specific word (which was part of
> .txt file I indexed before) it doesn't return me anything. I don't have luke
> setup on my end.
http://localhost:8983/solr/admin/luke should give yo some info.
> let me see if I can set that up quickly but otherwise do
> you see anything I am missing in solrconfig mapping or something?
What's your schema look like and how are you querying?
> which maps
> document "content" to wrong attribute?
>
> thanks,
>
> Grant Ingersoll-6 wrote:
>>
>>
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>
>>>
>>> Following code is from my test case where it tries to index a file (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>> server.request(up);
>>>
>>> test case doesn't give me any error and "I think" its indexing the file?
>>> but
>>> when I search for a text (which was part of the .txt file) search doesn't
>>> return me anything.
>>
>> What do your logs show? Else, what does Luke show or doing a *:* query
>> (assuming this is the only file you added)?
>>
>> Also, I don't think you need ext.literal anymore, just literal.
>>
>>>
>>> Following is the config from solrconfig.xml where I have mapped content
>>> to
>>> "description" field(default search field) in the schema.
>>>
>>> <requestHandler name="/update/extract"
>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>> <lst name="defaults">
>>> <str name="map.content">description</str>
>>> <str name="defaultField">description</str>
>>> </lst>
>>> </requestHandler>
>>>
>>> Clearly it seems I am missing something. Any idea?
>>
>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by javaxmlsoapdev <vi...@yahoo.com>.
FYI: weirdly its returning me following when I run
rsp.getResults().get(0).getFieldValue("description")
[702, text/plain, doc123.txt, ]
so it seems like its storing
up.setParam("ext.literal.docName", "doc123.txt"); into description versus
file content in "description" attribute.
Any idea?
Thanks,
javaxmlsoapdev wrote:
>
> *:* returns me 1 count but when I search for specific word (which was part
> of .txt file I indexed before) it doesn't return me anything. I don't have
> luke setup on my end. let me see if I can set that up quickly but
> otherwise do you see anything I am missing in solrconfig mapping or
> something? which maps document "content" to wrong attribute?
>
> thanks,
>
> Grant Ingersoll-6 wrote:
>>
>>
>> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>>
>>>
>>> Following code is from my test case where it tries to index a file (of
>>> type
>>> .txt)
>>> ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest("/update/extract");
>>> up.addFile(fileToIndex);
>>> up.setParam("literal.key", "8978"); //key is the uniqueId
>>> up.setParam("ext.literal.docName", "doc123.txt");
>>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>>> server.request(up);
>>>
>>> test case doesn't give me any error and "I think" its indexing the file?
>>> but
>>> when I search for a text (which was part of the .txt file) search
>>> doesn't
>>> return me anything.
>>
>> What do your logs show? Else, what does Luke show or doing a *:* query
>> (assuming this is the only file you added)?
>>
>> Also, I don't think you need ext.literal anymore, just literal.
>>
>>>
>>> Following is the config from solrconfig.xml where I have mapped content
>>> to
>>> "description" field(default search field) in the schema.
>>>
>>> <requestHandler name="/update/extract"
>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>> <lst name="defaults">
>>> <str name="map.content">description</str>
>>> <str name="defaultField">description</str>
>>> </lst>
>>> </requestHandler>
>>>
>>> Clearly it seems I am missing something. Any idea?
>>
>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
>
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487409.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by javaxmlsoapdev <vi...@yahoo.com>.
*:* returns me 1 count but when I search for specific word (which was part of
.txt file I indexed before) it doesn't return me anything. I don't have luke
setup on my end. let me see if I can set that up quickly but otherwise do
you see anything I am missing in solrconfig mapping or something? which maps
document "content" to wrong attribute?
thanks,
Grant Ingersoll-6 wrote:
>
>
> On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>
>>
>> Following code is from my test case where it tries to index a file (of
>> type
>> .txt)
>> ContentStreamUpdateRequest up = new
>> ContentStreamUpdateRequest("/update/extract");
>> up.addFile(fileToIndex);
>> up.setParam("literal.key", "8978"); //key is the uniqueId
>> up.setParam("ext.literal.docName", "doc123.txt");
>> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>> server.request(up);
>>
>> test case doesn't give me any error and "I think" its indexing the file?
>> but
>> when I search for a text (which was part of the .txt file) search doesn't
>> return me anything.
>
> What do your logs show? Else, what does Luke show or doing a *:* query
> (assuming this is the only file you added)?
>
> Also, I don't think you need ext.literal anymore, just literal.
>
>>
>> Following is the config from solrconfig.xml where I have mapped content
>> to
>> "description" field(default search field) in the schema.
>>
>> <requestHandler name="/update/extract"
>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>> <lst name="defaults">
>> <str name="map.content">description</str>
>> <str name="defaultField">description</str>
>> </lst>
>> </requestHandler>
>>
>> Clearly it seems I am missing something. Any idea?
>
>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>
--
View this message in context: http://old.nabble.com/ExternalRequestHandler-and-ContentStreamUpdateRequest-usage-tp26486817p26487320.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
>
> Following code is from my test case where it tries to index a file (of type
> .txt)
> ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
> up.addFile(fileToIndex);
> up.setParam("literal.key", "8978"); //key is the uniqueId
> up.setParam("ext.literal.docName", "doc123.txt");
> up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
> server.request(up);
>
> test case doesn't give me any error and "I think" its indexing the file? but
> when I search for a text (which was part of the .txt file) search doesn't
> return me anything.
What do your logs show? Else, what does Luke show or doing a *:* query (assuming this is the only file you added)?
Also, I don't think you need ext.literal anymore, just literal.
>
> Following is the config from solrconfig.xml where I have mapped content to
> "description" field(default search field) in the schema.
>
> <requestHandler name="/update/extract"
> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
> <lst name="defaults">
> <str name="map.content">description</str>
> <str name="defaultField">description</str>
> </lst>
> </requestHandler>
>
> Clearly it seems I am missing something. Any idea?
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search