You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Theodor Tolstoy <Th...@sub.su.se> on 2010/11/04 13:13:08 UTC

ContentStreamDataSource

Hi!
I am trying to get the ContentStreamDataSource to work properly , but there are not many examples out there.

What I have done is that  I have made a copy of my HttpDataSource config and replaced the <dataSource type="HttpDataSource"  with <dataSource type=" ContentStreamDataSource "

If understand everything correctly I should be able to use the same URL syntax as with HttpDataSource and supply the XML file as  post data.

I have tried to post data - both as binary, file and string to the URL, but nothing happens.


This is the log file:
2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DataImporter doFullImport
INFO: Starting Full Import
2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
VARNING: Unable to read: datapush.properties
2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:0.0
2010-nov-04 12:32:17 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/datapush params={clean=false&entity=suLIBRIS&command=full-import} status=0 QTime=0


What am I doing wrong?

Regards
Theodor Tolstoy
Developer Stockholm university library


Re: ContentStreamDataSource

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
for contentstreamdatasource to work you must post the stream in the request

On Thu, Nov 4, 2010 at 8:13 AM, Theodor Tolstoy
<Th...@sub.su.se>wrote:

> Hi!
> I am trying to get the ContentStreamDataSource to work properly , but there
> are not many examples out there.
>
> What I have done is that  I have made a copy of my HttpDataSource config
> and replaced the <dataSource type="HttpDataSource"  with <dataSource type="
> ContentStreamDataSource "
>
> If understand everything correctly I should be able to use the same URL
> syntax as with HttpDataSource and supply the XML file as  post data.
>
> I have tried to post data - both as binary, file and string to the URL, but
> nothing happens.
>
>
> This is the log file:
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> VARNING: Unable to read: datapush.properties
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DocBuilder execute
> INFO: Time taken = 0:0:0.0
> 2010-nov-04 12:32:17 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/datapush
> params={clean=false&entity=suLIBRIS&command=full-import} status=0 QTime=0
>
>
> What am I doing wrong?
>
> Regards
> Theodor Tolstoy
> Developer Stockholm university library
>
>


-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

SV: ContentStreamDataSource

Posted by Theodor Tolstoy <Th...@sub.su.se>.
I got it to work. There was an error in the requestHandler section in the solrconfig. Too bad I had to try almost every possible way to make http POST requests in .NET before realizing that...

For future reference, here is my solution:

//Example url: http://solrserver/solr/datapush?command=full-import&clean=false";

protected void PostToSolr(string url)
{
	HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
          	request.Accept = "text/xml";
           request.Method = "POST";

            using (FileStream fileStream = File.OpenRead(url))
            using (Stream requestStream = request.GetRequestStream())
            {
                int bufferSize = 1024;
                byte[] buffer = new byte[bufferSize];
                int byteCount = 0;
                while ((byteCount = fileStream.Read(buffer, 0, bufferSize)) > 0)
                {
                    requestStream.Write(buffer, 0, byteCount);
                }
            }

            string result;
            using (WebResponse response = request.GetResponse())
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                result = reader.ReadToEnd();
            }
            


        }


Here is my DIH config (dataconfigpush.xml) .

<dataConfig>
	<dataSource type="ContentStreamDataSource" connectionTimeout="300000" readTimeout="400000" />
	<document>			
		<entity name="suMARC"
				processor="XPathEntityProcessor"
				stream="false"
				forEach="/collection/record"
				onError ="continue"
				transformer="DateFormatTransformer, TemplateTransformer">
			
			<field column="id" xpath="/collection/record/field1"  />			
			<field column="titlePrimary" xpath="/collection/record/field2" />		
		</entity>
	</document>
</dataConfig>

And the relevant requesthandler part in solrconfig:

<requestHandler name="/datapush"
   	class="org.apache.solr.handler.dataimport.DataImportHandler">
   		
		<lst name="defaults">
			<str name="config">dataconfigpush.xml</str>
		</lst>
   </requestHandler>  

Thank you for your help. As you can see, you send GET parameters with the request, so it is not necessary to put the commands in the config.

-----Ursprungligt meddelande-----
Från: Lance Norskog [mailto:goksron@gmail.com] 
Skickat: den 6 november 2010 05:09
Till: solr-user@lucene.apache.org
Ämne: Re: ContentStreamDataSource

What program do you use to POST?

How do you give parameters to Solr? Are you doing multipart upload?

You might have to add all of your parameters to a custom requestHandler, like the /dataimport requestHandler.

Post your DIH config file, if you can.

On Thu, Nov 4, 2010 at 5:13 AM, Theodor Tolstoy <Th...@sub.su.se> wrote:
> Hi!
> I am trying to get the ContentStreamDataSource to work properly , but there are not many examples out there.
>
> What I have done is that  I have made a copy of my HttpDataSource config and replaced the <dataSource type="HttpDataSource"  with <dataSource type=" ContentStreamDataSource "
>
> If understand everything correctly I should be able to use the same URL syntax as with HttpDataSource and supply the XML file as  post data.
>
> I have tried to post data - both as binary, file and string to the URL, but nothing happens.
>
>
> This is the log file:
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DataImporter 
> doFullImport
> INFO: Starting Full Import
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.SolrWriter 
> readIndexerProperties
> VARNING: Unable to read: datapush.properties
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DocBuilder 
> execute
> INFO: Time taken = 0:0:0.0
> 2010-nov-04 12:32:17 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/datapush 
> params={clean=false&entity=suLIBRIS&command=full-import} status=0 
> QTime=0
>
>
> What am I doing wrong?
>
> Regards
> Theodor Tolstoy
> Developer Stockholm university library
>
>



--
Lance Norskog
goksron@gmail.com

Re: ContentStreamDataSource

Posted by Lance Norskog <go...@gmail.com>.
What program do you use to POST?

How do you give parameters to Solr? Are you doing multipart upload?

You might have to add all of your parameters to a custom
requestHandler, like the /dataimport requestHandler.

Post your DIH config file, if you can.

On Thu, Nov 4, 2010 at 5:13 AM, Theodor Tolstoy
<Th...@sub.su.se> wrote:
> Hi!
> I am trying to get the ContentStreamDataSource to work properly , but there are not many examples out there.
>
> What I have done is that  I have made a copy of my HttpDataSource config and replaced the <dataSource type="HttpDataSource"  with <dataSource type=" ContentStreamDataSource "
>
> If understand everything correctly I should be able to use the same URL syntax as with HttpDataSource and supply the XML file as  post data.
>
> I have tried to post data - both as binary, file and string to the URL, but nothing happens.
>
>
> This is the log file:
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DataImporter doFullImport
> INFO: Starting Full Import
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
> VARNING: Unable to read: datapush.properties
> 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DocBuilder execute
> INFO: Time taken = 0:0:0.0
> 2010-nov-04 12:32:17 org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/datapush params={clean=false&entity=suLIBRIS&command=full-import} status=0 QTime=0
>
>
> What am I doing wrong?
>
> Regards
> Theodor Tolstoy
> Developer Stockholm university library
>
>



-- 
Lance Norskog
goksron@gmail.com