You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "shubham.gupta" <sh...@orkash.com> on 2017/01/12 11:50:46 UTC
Changing date format while page is parsed
Hey,
When a webpage is parsed it stores the date in the dt_stamp in Long
format. Whereas, I want to store it in ISODate Format. I have tried to
change the type of java.lang.Long to java .util.Long and have changed
while setting dtStamp and getting dtStamp.
But after i build the project, these are the wrrors I get:
GeneratorReducer.java:80: error: incompatible types: long cannot be
converted to Date
[javac] page.setDtStamp(1000L);
[javac] ^
InjectorJob.java:161: error: incompatible types: long cannot be
converted to Date
[javac] row.setDtStamp(curTime);
[javac] ^
FetcherReducer.java:682: error: incompatible types: long cannot be
converted to Date
[javac] fit.page.setDtStamp(2000L);
[javac] ^
[javac] Note: Some input files use unchecked or unsafe operations.
What changes should be made such that I can enable the dt_stamp to be
stored in Date Format?
--
Thanks and Regards,
Shubham Gupta
Re: Changing date format while page is parsed
Posted by "shubham.gupta" <sh...@orkash.com>.
org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date);
Where do I have to make these changes?
Thanks and Regards,
Shubham Gupta
On Friday 13 January 2017 09:18 AM, shubham.gupta wrote:
> Also, there is a problem that all documents have a dt_stamp have
> NumberLong(2000) that is when converted to Date format gives 1 Jan 1970.
>
> Thanks,
> Shubham Gupta
>
> On Thursday 12 January 2017 05:30 PM, Furkan KAMACI wrote:
>> Hi Shubham,
>>
>> Do you need to store it in Date format instead of Long? You can store
>> it as
>> Long and convert to ISO Date whenever you want. You can follow that:
>>
>> org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date);
>>
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> On Thu, Jan 12, 2017 at 1:50 PM, shubham.gupta
>> <sh...@orkash.com>
>> wrote:
>>
>>> Hey,
>>>
>>> When a webpage is parsed it stores the date in the dt_stamp in Long
>>> format. Whereas, I want to store it in ISODate Format. I have tried to
>>> change the type of java.lang.Long to java .util.Long and have
>>> changed while
>>> setting dtStamp and getting dtStamp.
>>>
>>> But after i build the project, these are the wrrors I get:
>>>
>>> GeneratorReducer.java:80: error: incompatible types: long
>>> cannot be
>>> converted to Date
>>> [javac] page.setDtStamp(1000L);
>>> [javac] ^
>>> InjectorJob.java:161: error: incompatible types: long cannot be
>>> converted to Date
>>> [javac] row.setDtStamp(curTime);
>>> [javac] ^
>>> FetcherReducer.java:682: error: incompatible types: long cannot be
>>> converted to Date
>>>
>>> [javac] fit.page.setDtStamp(2000L);
>>> [javac] ^
>>> [javac] Note: Some input files use unchecked or unsafe operations.
>>>
>>> What changes should be made such that I can enable the dt_stamp to be
>>> stored in Date Format?
>>>
>>> --
>>> Thanks and Regards,
>>> Shubham Gupta
>>>
>>>
>
Re: Changing date format while page is parsed
Posted by "shubham.gupta" <sh...@orkash.com>.
Also, there is a problem that all documents have a dt_stamp have
NumberLong(2000) that is when converted to Date format gives 1 Jan 1970.
Thanks,
Shubham Gupta
On Thursday 12 January 2017 05:30 PM, Furkan KAMACI wrote:
> Hi Shubham,
>
> Do you need to store it in Date format instead of Long? You can store it as
> Long and convert to ISO Date whenever you want. You can follow that:
>
> org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date);
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Jan 12, 2017 at 1:50 PM, shubham.gupta <sh...@orkash.com>
> wrote:
>
>> Hey,
>>
>> When a webpage is parsed it stores the date in the dt_stamp in Long
>> format. Whereas, I want to store it in ISODate Format. I have tried to
>> change the type of java.lang.Long to java .util.Long and have changed while
>> setting dtStamp and getting dtStamp.
>>
>> But after i build the project, these are the wrrors I get:
>>
>> GeneratorReducer.java:80: error: incompatible types: long cannot be
>> converted to Date
>> [javac] page.setDtStamp(1000L);
>> [javac] ^
>> InjectorJob.java:161: error: incompatible types: long cannot be
>> converted to Date
>> [javac] row.setDtStamp(curTime);
>> [javac] ^
>> FetcherReducer.java:682: error: incompatible types: long cannot be
>> converted to Date
>>
>> [javac] fit.page.setDtStamp(2000L);
>> [javac] ^
>> [javac] Note: Some input files use unchecked or unsafe operations.
>>
>> What changes should be made such that I can enable the dt_stamp to be
>> stored in Date Format?
>>
>> --
>> Thanks and Regards,
>> Shubham Gupta
>>
>>
Re: Changing date format while page is parsed
Posted by Furkan KAMACI <fu...@gmail.com>.
Hi Shubham,
Do you need to store it in Date format instead of Long? You can store it as
Long and convert to ISO Date whenever you want. You can follow that:
org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date);
Kind Regards,
Furkan KAMACI
On Thu, Jan 12, 2017 at 1:50 PM, shubham.gupta <sh...@orkash.com>
wrote:
> Hey,
>
> When a webpage is parsed it stores the date in the dt_stamp in Long
> format. Whereas, I want to store it in ISODate Format. I have tried to
> change the type of java.lang.Long to java .util.Long and have changed while
> setting dtStamp and getting dtStamp.
>
> But after i build the project, these are the wrrors I get:
>
> GeneratorReducer.java:80: error: incompatible types: long cannot be
> converted to Date
> [javac] page.setDtStamp(1000L);
> [javac] ^
> InjectorJob.java:161: error: incompatible types: long cannot be
> converted to Date
> [javac] row.setDtStamp(curTime);
> [javac] ^
> FetcherReducer.java:682: error: incompatible types: long cannot be
> converted to Date
>
> [javac] fit.page.setDtStamp(2000L);
> [javac] ^
> [javac] Note: Some input files use unchecked or unsafe operations.
>
> What changes should be made such that I can enable the dt_stamp to be
> stored in Date Format?
>
> --
> Thanks and Regards,
> Shubham Gupta
>
>
Re: Changing date format while page is parsed
Posted by "shubham.gupta" <sh...@orkash.com>.
Check for the field 24. It is the dt_stamp field. Also,why has the time
been hardcoded to 1000L and 2000L?
Thanks and Regards,
Shubham Gupta
On Saturday 14 January 2017 12:06 PM, vickyk wrote:
> shubham.gupta wrote
>> Hey,
>>
>> When a webpage is parsed it stores the date in the dt_stamp in Long
>> format. Whereas, I want to store it in ISODate Format. I have tried to
>> change the type of java.lang.Long to java .util.Long and have changed
>> while setting dtStamp and getting dtStamp.
>>
>> But after i build the project, these are the wrrors I get:
>>
>> GeneratorReducer.java:80: error: incompatible types: long cannot be
>> converted to Date
>> [javac] page.setDtStamp(1000L);
>> [javac] ^
>> InjectorJob.java:161: error: incompatible types: long cannot be
>> converted to Date
>> [javac] row.setDtStamp(curTime);
>> [javac] ^
>> FetcherReducer.java:682: error: incompatible types: long cannot be
>> converted to Date
>>
>> [javac] fit.page.setDtStamp(2000L);
>> [javac] ^
>> [javac] Note: Some input files use unchecked or unsafe operations.
>>
>> What changes should be made such that I can enable the dt_stamp to be
>> stored in Date Format?
>>
>> --
>> Thanks and Regards,
>> Shubham Gupta
>
> I have been digging the Nutch code for our requirement, I don't see the
> field name you mentioned in the WebPage class, I see the following
> public void setModifiedTime(java.lang.Long value) {
> this.modifiedTime = value;
> setDirty(6);
> }
>
> I guess you have implemented your own customer Plugin with the Parser
> extension point. Is that correct?
> Can you please provide more details, may be I can provide some direction?
>
> Thanks,
> Vicky
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Changing-date-format-while-page-is-parsed-tp4313694p4313951.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Changing date format while page is parsed
Posted by vickyk <vi...@gmail.com>.
shubham.gupta wrote
> Hey,
>
> When a webpage is parsed it stores the date in the dt_stamp in Long
> format. Whereas, I want to store it in ISODate Format. I have tried to
> change the type of java.lang.Long to java .util.Long and have changed
> while setting dtStamp and getting dtStamp.
>
> But after i build the project, these are the wrrors I get:
>
> GeneratorReducer.java:80: error: incompatible types: long cannot be
> converted to Date
> [javac] page.setDtStamp(1000L);
> [javac] ^
> InjectorJob.java:161: error: incompatible types: long cannot be
> converted to Date
> [javac] row.setDtStamp(curTime);
> [javac] ^
> FetcherReducer.java:682: error: incompatible types: long cannot be
> converted to Date
>
> [javac] fit.page.setDtStamp(2000L);
> [javac] ^
> [javac] Note: Some input files use unchecked or unsafe operations.
>
> What changes should be made such that I can enable the dt_stamp to be
> stored in Date Format?
>
> --
> Thanks and Regards,
> Shubham Gupta
I have been digging the Nutch code for our requirement, I don't see the
field name you mentioned in the WebPage class, I see the following
public void setModifiedTime(java.lang.Long value) {
this.modifiedTime = value;
setDirty(6);
}
I guess you have implemented your own customer Plugin with the Parser
extension point. Is that correct?
Can you please provide more details, may be I can provide some direction?
Thanks,
Vicky
--
View this message in context: http://lucene.472066.n3.nabble.com/Changing-date-format-while-page-is-parsed-tp4313694p4313951.html
Sent from the Nutch - User mailing list archive at Nabble.com.