You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by harshadmehta <ha...@yahoo.com> on 2012/08/31 05:03:30 UTC

need basic information

I am trying to use solr for processing logs and searching them.

But i dont see a clear sample anywhere for this type of scenario. 

i need to index plain text files , example content - 

yyyy-mm-dd : Account 123 created
yyyy-mm-dd : Account 123 updated
....

Account 123 being spread across multiple files.

How do i index this so that i can search Account 123 activity over a date
range.

Using default solr config search, i will get each log file in full that has
any entry for Account 123 plus all other Accounts in those files as well.

Thanks





--
View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need basic information

Posted by harshadmehta <ha...@yahoo.com>.
You got what i am looking for but indexing part is where i am not sure how
it needs to be done.

So to send these log files for indexing in CSV format, is it just a
configuration change to pull these 3 fields from each line in text files or
i need to write code for that.

I simplified the lines in text file but they will contain lot more text, but
i don't think that will matter.

Thanks





--
View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004660.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need basic information

Posted by pravesh <su...@yahoo.com>.
One basic and trivial solution could be to have schema like;

Date (of type date/string) --> this would store the 'yyyy-mm-dd' format date
Tag (of type string) --> the text/tag 'Account' goes into this
account-id (of type sint/int) --> account id like '123' goes into this
action (of type sting) --> values like 'created'/'updated' goes into this

Then just push your logs into solr.  http://wiki.apache.org/solr/UpdateCSV
http://wiki.apache.org/solr/UpdateCSV 

Then to get log activity for account id '123', you could query like:

http://localhost:<port>/solr/select/?q=id:123&fq=Tag:Account&fq=Date:[d1 TO
d2]
then process the results for plotting/reporting

OR you could ask for faceting on the 'action' field like;
http://localhost:<port>/solr/select/?q=id:123&fq=Tag:Account&fq=Date:[d1 TO
d2]&facet=true&facet.field=action

This way you have facet count for created/updated/deleted etc.

Hope this is what u r looking for.

Thanx
Pravesh




--
View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004637.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need basic information

Posted by harshadmehta <ha...@yahoo.com>.
I have looked at splunk and logstash but want to explore solr to do the job.

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004763.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need basic information

Posted by pravesh <su...@yahoo.com>.
Do logstash/graylog2 do log processing/searching in real time? Or can scale
for real time need?
I guess harshadmehta is looking for real-time indexing/search.

Regards
Pravesh



--
View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004996.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need basic information

Posted by Walter Underwood <wu...@wunderwood.org>.
Agreed. There are a lot of products that do this already. Writing it from scratch in Solr seems like a huge waste of time. You should also check out Graylog2: http://graylog2.org/

wunder

On Aug 31, 2012, at 7:05 AM, Alexandre Rafalovitch wrote:

> Have you tried looking at http://logstash.net/ first? Or Splunk
> (http://www.splunk.com/) if you have money.... These might be a better
> starting point than bare SOLR.
> 
> Regards,
>   Alex
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> 
> 
> On Thu, Aug 30, 2012 at 11:03 PM, harshadmehta <ha...@yahoo.com> wrote:
>> I am trying to use solr for processing logs and searching them.
>> 
>> But i dont see a clear sample anywhere for this type of scenario.
>> 
>> i need to index plain text files , example content -
>> 
>> yyyy-mm-dd : Account 123 created
>> yyyy-mm-dd : Account 123 updated
>> ....
>> 
>> Account 123 being spread across multiple files.
>> 
>> How do i index this so that i can search Account 123 activity over a date
>> range.
>> 
>> Using default solr config search, i will get each log file in full that has
>> any entry for Account 123 plus all other Accounts in those files as well.
>> 
>> Thanks
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
wunder@wunderwood.org




Re: need basic information

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Have you tried looking at http://logstash.net/ first? Or Splunk
(http://www.splunk.com/) if you have money.... These might be a better
starting point than bare SOLR.

Regards,
   Alex
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Aug 30, 2012 at 11:03 PM, harshadmehta <ha...@yahoo.com> wrote:
> I am trying to use solr for processing logs and searching them.
>
> But i dont see a clear sample anywhere for this type of scenario.
>
> i need to index plain text files , example content -
>
> yyyy-mm-dd : Account 123 created
> yyyy-mm-dd : Account 123 updated
> ....
>
> Account 123 being spread across multiple files.
>
> How do i index this so that i can search Account 123 activity over a date
> range.
>
> Using default solr config search, i will get each log file in full that has
> any entry for Account 123 plus all other Accounts in those files as well.
>
> Thanks
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: need basic information

Posted by Jack Krupansky <ja...@basetechnology.com>.
Think of the log file as a flat database, each line/entry a "row". So, each 
log line/entry would need to be added to Solr as a separate document.

Maybe you could do this using DIH and a LineEntityProcessor and 
RegexTransformer, DateFormatTransformer, etc.

-- Jack Krupansky

-----Original Message----- 
From: harshadmehta
Sent: Thursday, August 30, 2012 11:03 PM
To: solr-user@lucene.apache.org
Subject: need basic information

I am trying to use solr for processing logs and searching them.

But i dont see a clear sample anywhere for this type of scenario.

i need to index plain text files , example content -

yyyy-mm-dd : Account 123 created
yyyy-mm-dd : Account 123 updated
....

Account 123 being spread across multiple files.

How do i index this so that i can search Account 123 activity over a date
range.

Using default solr config search, i will get each log file in full that has
any entry for Account 123 plus all other Accounts in those files as well.

Thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588.html
Sent from the Solr - User mailing list archive at Nabble.com.