You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dinesh <md...@karunya.edu.in> on 2011/01/07 06:01:51 UTC

Input raw log file

ho to give the raw log file as input to solr instead of xml file.. i'm
working with log files from DHCP server and i want to index the datas.. pls
help....
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2210043.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by Peter Karich <pe...@yahoo.de>.
 Dinesh,

it will stay 'real time' even if you convert it. Converting should be
done in the millisecond range if at all measureable (e.g. if you apply
streaming).
Beware: To use the real features you'll need the latest trunk of solr IMHO.

I've done similar log-feeding stuff here (with code!):
http://karussell.wordpress.com/2010/10/27/feeding-solr-with-its-own-logs/
(not with a realtime solr!)
You'll have to adapt the parser/matcher to fit your needs.

Regards,
Peter.

> if i convert it to CSV or XML then it will be time consuming cause the
> indexing and getting data out of it should be real time.. is there any way i
> can do other than this.. if not what are the ways i can convert them to CSV
> and XML.. and lastly which is the doc folder of solr


-- 
http://jetwick.com open twitter search


Re: Input raw log file

Posted by Dennis Gearon <ge...@sbcglobal.net>.
A possible shortcut?

Write a regex that will parse out the fields as you want them, put that into 
some shell script that calls solr?

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Grijesh.singh <pi...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Tue, January 11, 2011 10:46:20 PM
Subject: Re: Input raw log file


First thing is that your raw log files solr can not understand. Solr needs
data according to schema  defined And also solr does not know your log file
format .

So you have to write a parser program that will parse your log files into a
existing solr writable formats .Then you can be able to index that data.



-----
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239548.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Input raw log file

Posted by "Grijesh.singh" <pi...@gmail.com>.
First thing is that your raw log files solr can not understand. Solr needs
data according to schema  defined And also solr does not know your log file
format .

So you have to write a parser program that will parse your log files into a
existing solr writable formats .Then you can be able to index that data.



-----
Grijesh
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239548.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by Dinesh <md...@karunya.edu.in>.
i got some idea like creating a DIH and then doing with that.. thanks every
one for the help.. hope i'll create an regex DIH i guess that's right..
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239947.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by Gora Mohanty <go...@mimirtech.com>.
On Wed, Jan 12, 2011 at 12:10 PM, Dinesh <md...@karunya.edu.in> wrote:
>
> if i convert it to CSV or XML then it will be time consuming cause the
> indexing and getting data out of it should be real time.. is there any way i
> can do other than this.. if not what are the ways i can convert them to CSV
> and XML.. and lastly which is the doc folder of solr
[...]

What is "real time" for you? Conversion should be pretty fast.

Also, you could use a FileDataSource, LineEntityProcessor,
and a RegexTransformer to pick up data right from the text
files. This is why I recommended this link to you originally:
http://robotlibrarian.billdueber.com/an-exercise-in-solr-and-dataimporthandler-hathitrust-data/

Regards,
Goea

Re: Input raw log file

Posted by Dinesh <md...@karunya.edu.in>.
if i convert it to CSV or XML then it will be time consuming cause the
indexing and getting data out of it should be real time.. is there any way i
can do other than this.. if not what are the ways i can convert them to CSV
and XML.. and lastly which is the doc folder of solr
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239538.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by "Grijesh.singh" <pi...@gmail.com>.
It will not work.
I think your log files are not in solr Doc xml files.

First thing is that your log files is raw data.
you have to convert it to any of solr readable data either in solr xml DOC
or CSV format to index on solr As Gora suggested to you.

-----
Grijesh
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239530.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by Dinesh <md...@karunya.edu.in>.
i copied it to the same exampledocs folder and did
#java -jar post.jar log.txt

and i got

SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file log.txt
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character_S_code_83_in_prolog_expected___at_rowcol_unknownsource_11

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239518.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by "Grijesh.singh" <pi...@gmail.com>.
How you parsed you log?
Which way you gone for index of log file data?
have you done any work what Gora Mohanty has suggested to you.

I am local for Delhi NCR area.

-----
Grijesh
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239505.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by Gora Mohanty <go...@mimirtech.com>.
On Wed, Jan 12, 2011 at 11:50 AM, Dinesh <md...@karunya.edu.in> wrote:
>
> I have installed and tested the sample xml file and tried indexing..
> everything went successful and when i tried with log files i got an error..

Please provide details of what you are doing, and of the error messages.
How exactly are you sending the data files to Solr for indexing? Also,
note that you will most likely need to change the default schema.xml.

> i tried reading the schema.xml and didn't get a clear idea.. can you please
> help..

It is very difficult to try to help you, given the scarce details that you
provide. I would again suggest that you look for someone local to help
you out. Alternatively, read carefully through the extensive documentation
on the Solr Wiki, or get a copy of the Solr book:
https://www.packtpub.com/solr-1-4-enterprise-search-server/book

Regards,
Gora

Re: Input raw log file

Posted by Dinesh <md...@karunya.edu.in>.
I have installed and tested the sample xml file and tried indexing..
everything went successful and when i tried with log files i got an error..
i tried reading the schema.xml and didn't get a clear idea.. can you please
help..  
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by Gora Mohanty <go...@mimirtech.com>.
On Tue, Jan 11, 2011 at 10:06 AM, Dinesh <md...@karunya.edu.in> wrote:
>
> can u give an example.. like something that is currently being used..

Sorry, I do not have anything like this at hand at the moment.

>                                                                                            i'am an
> engineering student and my project is to index all the real time log files
> from different devices and use some artificial intelligence and produce a
> usefull data out of it.. i'm doing this for my college.. i'm struggling more
> than a month even for a start..
[...]

It should not really be that hard. Did you go through the Solr tutorial,
get the example working, and grasp the basics of Solr indexing, and
search? If so, then it is just a matter of setting up what should be a
simple Solr schema, extracting the relevant data from the log files,
and posting it to Solr for indexing. Where exactly in this process are
you having trouble? Can you post a small (say, 10-15 lines) excerpt
of your log files, indicating which of the data you want to keep. No
promises, but maybe someone will have the time to take a crack at
it. The other way might be to look for help from a local expert in
Solr.

Finally, I am afraid that data mining / artificial intelligence is beyond
the scope of Solr, but you could look at something like Apache Mahout.

Regards,
Gora

Re: Input raw log file

Posted by Dinesh <md...@karunya.edu.in>.
can u give an example.. like something that is currently being used.. i'am an
engineering student and my project is to index all the real time log files
from different devices and use some artificial intelligence and produce a
usefull data out of it.. i'm doing this for my college.. i'm struggling more
than a month even for a start.. 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2232604.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by Gora Mohanty <go...@mimirtech.com>.
On Sat, Jan 8, 2011 at 3:50 PM, Dinesh <md...@karunya.edu.in> wrote:
>
> i don't have much idea about converting log into CSV and then giving it as
> input.. can u please specify how to do it excatly..
[...]

As the format of the raw log file is known only to you, it
is difficult for someone to give you advice on this. What
you need to do is to transform the parts of your input
data that you wish to keep into a format that Solr can
import.

Besides CSV, you might find using the RegexTransformer
easier. This blog article might be of help:
http://robotlibrarian.billdueber.com/an-exercise-in-solr-and-dataimporthandler-hathitrust-data/

Regards,
Gora

Re: Input raw log file

Posted by Dinesh <md...@karunya.edu.in>.
i don't have much idea about converting log into CSV and then giving it as
input.. can u please specify how to do it excatly..
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2216083.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

Posted by "Grijesh.singh" <pi...@gmail.com>.
There is a csv update handler in solr you can use it by modifying your
logfile

-----
Grijesh
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2210673.html
Sent from the Solr - User mailing list archive at Nabble.com.