You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "shubham.gupta" <sh...@orkash.com> on 2017/01/12 11:50:46 UTC

Changing date format while page is parsed

Hey,

When a webpage is parsed it stores the date in the dt_stamp in Long 
format. Whereas, I want to store it in ISODate Format. I have tried to 
change the type of java.lang.Long to java .util.Long and have changed 
while setting dtStamp and getting dtStamp.

But after i build the project, these are the wrrors I get:

     GeneratorReducer.java:80: error: incompatible types: long cannot be 
converted to Date
     [javac]           page.setDtStamp(1000L);
     [javac]                                        ^
     InjectorJob.java:161: error: incompatible types: long cannot be 
converted to Date
     [javac]         row.setDtStamp(curTime);
     [javac]                                    ^
     FetcherReducer.java:682: error: incompatible types: long cannot be 
converted to Date

     [javac]             fit.page.setDtStamp(2000L);
     [javac]                                              ^
     [javac] Note: Some input files use unchecked or unsafe operations.

What changes should be made such that I can enable the dt_stamp to be 
stored in Date Format?

-- 
Thanks and Regards,
Shubham Gupta


Re: Changing date format while page is parsed

Posted by "shubham.gupta" <sh...@orkash.com>.
org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date); 


Where do I have to make these changes?

Thanks and Regards,
Shubham Gupta

On Friday 13 January 2017 09:18 AM, shubham.gupta wrote:
> Also, there is a problem that all documents have a dt_stamp have 
> NumberLong(2000) that is when converted to Date format gives 1 Jan 1970.
>
> Thanks,
> Shubham Gupta
>
> On Thursday 12 January 2017 05:30 PM, Furkan KAMACI wrote:
>> Hi Shubham,
>>
>> Do you need to store it in Date format instead of Long? You can store 
>> it as
>> Long and convert to ISO Date whenever you want. You can follow that:
>>
>> org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date); 
>>
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> On Thu, Jan 12, 2017 at 1:50 PM, shubham.gupta 
>> <sh...@orkash.com>
>> wrote:
>>
>>> Hey,
>>>
>>> When a webpage is parsed it stores the date in the dt_stamp in Long
>>> format. Whereas, I want to store it in ISODate Format. I have tried to
>>> change the type of java.lang.Long to java .util.Long and have 
>>> changed while
>>> setting dtStamp and getting dtStamp.
>>>
>>> But after i build the project, these are the wrrors I get:
>>>
>>>      GeneratorReducer.java:80: error: incompatible types: long 
>>> cannot be
>>> converted to Date
>>>      [javac]           page.setDtStamp(1000L);
>>>      [javac]                                        ^
>>>      InjectorJob.java:161: error: incompatible types: long cannot be
>>> converted to Date
>>>      [javac]         row.setDtStamp(curTime);
>>>      [javac]                                    ^
>>>      FetcherReducer.java:682: error: incompatible types: long cannot be
>>> converted to Date
>>>
>>>      [javac]             fit.page.setDtStamp(2000L);
>>>      [javac]                                              ^
>>>      [javac] Note: Some input files use unchecked or unsafe operations.
>>>
>>> What changes should be made such that I can enable the dt_stamp to be
>>> stored in Date Format?
>>>
>>> -- 
>>> Thanks and Regards,
>>> Shubham Gupta
>>>
>>>
>


Re: Changing date format while page is parsed

Posted by "shubham.gupta" <sh...@orkash.com>.
Also, there is a problem that all documents have a dt_stamp have 
NumberLong(2000) that is when converted to Date format gives 1 Jan 1970.

Thanks,
Shubham Gupta

On Thursday 12 January 2017 05:30 PM, Furkan KAMACI wrote:
> Hi Shubham,
>
> Do you need to store it in Date format instead of Long? You can store it as
> Long and convert to ISO Date whenever you want. You can follow that:
>
> org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date);
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Jan 12, 2017 at 1:50 PM, shubham.gupta <sh...@orkash.com>
> wrote:
>
>> Hey,
>>
>> When a webpage is parsed it stores the date in the dt_stamp in Long
>> format. Whereas, I want to store it in ISODate Format. I have tried to
>> change the type of java.lang.Long to java .util.Long and have changed while
>> setting dtStamp and getting dtStamp.
>>
>> But after i build the project, these are the wrrors I get:
>>
>>      GeneratorReducer.java:80: error: incompatible types: long cannot be
>> converted to Date
>>      [javac]           page.setDtStamp(1000L);
>>      [javac]                                        ^
>>      InjectorJob.java:161: error: incompatible types: long cannot be
>> converted to Date
>>      [javac]         row.setDtStamp(curTime);
>>      [javac]                                    ^
>>      FetcherReducer.java:682: error: incompatible types: long cannot be
>> converted to Date
>>
>>      [javac]             fit.page.setDtStamp(2000L);
>>      [javac]                                              ^
>>      [javac] Note: Some input files use unchecked or unsafe operations.
>>
>> What changes should be made such that I can enable the dt_stamp to be
>> stored in Date Format?
>>
>> --
>> Thanks and Regards,
>> Shubham Gupta
>>
>>


Re: Changing date format while page is parsed

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi Shubham,

Do you need to store it in Date format instead of Long? You can store it as
Long and convert to ISO Date whenever you want. You can follow that:

org.apache.commons.lang.time.FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss'Z'").format(date);

Kind Regards,
Furkan KAMACI

On Thu, Jan 12, 2017 at 1:50 PM, shubham.gupta <sh...@orkash.com>
wrote:

> Hey,
>
> When a webpage is parsed it stores the date in the dt_stamp in Long
> format. Whereas, I want to store it in ISODate Format. I have tried to
> change the type of java.lang.Long to java .util.Long and have changed while
> setting dtStamp and getting dtStamp.
>
> But after i build the project, these are the wrrors I get:
>
>     GeneratorReducer.java:80: error: incompatible types: long cannot be
> converted to Date
>     [javac]           page.setDtStamp(1000L);
>     [javac]                                        ^
>     InjectorJob.java:161: error: incompatible types: long cannot be
> converted to Date
>     [javac]         row.setDtStamp(curTime);
>     [javac]                                    ^
>     FetcherReducer.java:682: error: incompatible types: long cannot be
> converted to Date
>
>     [javac]             fit.page.setDtStamp(2000L);
>     [javac]                                              ^
>     [javac] Note: Some input files use unchecked or unsafe operations.
>
> What changes should be made such that I can enable the dt_stamp to be
> stored in Date Format?
>
> --
> Thanks and Regards,
> Shubham Gupta
>
>

Re: Changing date format while page is parsed

Posted by "shubham.gupta" <sh...@orkash.com>.
Check for the field 24. It is the dt_stamp field. Also,why has the time 
been hardcoded to 1000L and 2000L?

Thanks and Regards,
Shubham Gupta

On Saturday 14 January 2017 12:06 PM, vickyk wrote:
> shubham.gupta wrote
>> Hey,
>>
>> When a webpage is parsed it stores the date in the dt_stamp in Long
>> format. Whereas, I want to store it in ISODate Format. I have tried to
>> change the type of java.lang.Long to java .util.Long and have changed
>> while setting dtStamp and getting dtStamp.
>>
>> But after i build the project, these are the wrrors I get:
>>
>>       GeneratorReducer.java:80: error: incompatible types: long cannot be
>> converted to Date
>>       [javac]           page.setDtStamp(1000L);
>>       [javac]                                        ^
>>       InjectorJob.java:161: error: incompatible types: long cannot be
>> converted to Date
>>       [javac]         row.setDtStamp(curTime);
>>       [javac]                                    ^
>>       FetcherReducer.java:682: error: incompatible types: long cannot be
>> converted to Date
>>
>>       [javac]             fit.page.setDtStamp(2000L);
>>       [javac]                                              ^
>>       [javac] Note: Some input files use unchecked or unsafe operations.
>>
>> What changes should be made such that I can enable the dt_stamp to be
>> stored in Date Format?
>>
>> -- 
>> Thanks and Regards,
>> Shubham Gupta
>
> I have been digging the Nutch code for our requirement, I don't see the
> field name you mentioned in the WebPage class, I see the following
> public void setModifiedTime(java.lang.Long value) {
>      this.modifiedTime = value;
>      setDirty(6);
>    }
>
> I guess you have implemented your own customer Plugin with the Parser
> extension point. Is that correct?
> Can you please provide more details, may be I can provide some direction?
>
> Thanks,
> Vicky
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Changing-date-format-while-page-is-parsed-tp4313694p4313951.html
> Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Changing date format while page is parsed

Posted by vickyk <vi...@gmail.com>.
shubham.gupta wrote
> Hey,
> 
> When a webpage is parsed it stores the date in the dt_stamp in Long 
> format. Whereas, I want to store it in ISODate Format. I have tried to 
> change the type of java.lang.Long to java .util.Long and have changed 
> while setting dtStamp and getting dtStamp.
> 
> But after i build the project, these are the wrrors I get:
> 
>      GeneratorReducer.java:80: error: incompatible types: long cannot be 
> converted to Date
>      [javac]           page.setDtStamp(1000L);
>      [javac]                                        ^
>      InjectorJob.java:161: error: incompatible types: long cannot be 
> converted to Date
>      [javac]         row.setDtStamp(curTime);
>      [javac]                                    ^
>      FetcherReducer.java:682: error: incompatible types: long cannot be 
> converted to Date
> 
>      [javac]             fit.page.setDtStamp(2000L);
>      [javac]                                              ^
>      [javac] Note: Some input files use unchecked or unsafe operations.
> 
> What changes should be made such that I can enable the dt_stamp to be 
> stored in Date Format?
> 
> -- 
> Thanks and Regards,
> Shubham Gupta


I have been digging the Nutch code for our requirement, I don't see the
field name you mentioned in the WebPage class, I see the following 
public void setModifiedTime(java.lang.Long value) {
    this.modifiedTime = value;
    setDirty(6);
  }

I guess you have implemented your own customer Plugin with the Parser
extension point. Is that correct?
Can you please provide more details, may be I can provide some direction?

Thanks,
Vicky




--
View this message in context: http://lucene.472066.n3.nabble.com/Changing-date-format-while-page-is-parsed-tp4313694p4313951.html
Sent from the Nutch - User mailing list archive at Nabble.com.