You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Daniel Penning <d....@fire-development.com> on 2008/12/03 18:12:57 UTC

NNTP Client -> identical headers?

Hello,

 

I am trying to implement a nntp header parser with Jakarta Commons Net.
The authentication and receiving of groups works well but I got strange
results when I am trying to download a huge number (100.000) of article
HEADER in a binary group.

There are a lot of headers (approx. 10-20% which are identical in
subject, author and group. Only the size / header number / messageID
differs somewhat.

Why could this happen?

 

My source code to receive the header: (client is an open and authed
nntp-Client)

 

                  ArrayList<Header> headers = new ArrayList<Header>();

                  DotTerminatedMessageReader reader =
(DotTerminatedMessageReader)

                        client.retrieveArticleInfo((int)rangeFrom,
(int)rangeTo);  

                  BufferedReader stringReader = new
BufferedReader(reader);

                  String line = null;

                  String[] header = new String[6];

                  while((line = stringReader.readLine()) != null)

                  {

                        try

                        {

                             header = line.split("\t");

                             int number = Integer.parseInt(header[0]);

                             String subject = header[1];

                             String author = header[2];

                             Date date = this.parseDate(header[3]);

                             String messageID = header[4];

                             int size = Integer.parseInt(header[6]);

                             headers.add(new Header(number, subject,
author, date.getTime()/1000, size, messageID, parser));

                        }

                        catch (Exception e)

                        {

                             // exception handling...

                        }

                  }

 

Now I am sorting my header-objects and finally got a lot of duplicates,
as described.

Is there any mistake in my code?

 

Kind regards, D.Penning

 

 


AW: AW: NNTP Client -> identical headers?

Posted by Daniel Penning <d....@fire-development.com>.
Hi Rory,

ok thanks a lot for your reply.
I will take a look to the NNTPUtils-Class and yes, it's a little bit confusing that these methods are outsourced in another class ;)

Yeah, there is a lot of automation when posting these headers but I'm still wondering about so much duplicates.
The fact is that these duplicates are not exactly equal, especial their size differ for about 10-20 bytes. Is this size-variance maybe caused by different message-id and so on or describes the size only the proper value?

My task is to parse the header information and create a set of headers which can be downloaded later. So there is a problem if I don't know which of the 2 (maybe more) duplicates is the correct header.

Kind regards,
Daniel

-----Ursprüngliche Nachricht-----
Von: Rory Winston [mailto:rory.winston@gmail.com] 
Gesendet: Donnerstag, 4. Dezember 2008 09:46
An: Commons Users List
Betreff: Re: AW: NNTP Client -> identical headers?

Hi Daniel

Your method looks correct - you are reading a bunch of tab-delimited 
lines and parsing the header values from that, which is what the 
NNTPUtils class does under the hood. You may find it easier to use the 
NNTPUtils::getArticleInfo() method to retrieve the information that you 
are currently parsing manually. I should move that class from the 
examples package to make that easier.


Possibly the reason for so much duplication is (if it is a binary 
newsgroup) articles being posted by an automated process.

Rory

Daniel Penning wrote:
> Hi Rory,
>
> I'm not sure if the header values are correct ;) I'm only wondering why there are so much headers which are - as described - nearly identical in their values.
> Is my way to parse the Stream into different headers a common way to do this job?
>
> Thanks a lot, Daniel
>
> -----Ursprüngliche Nachricht-----
> Von: Rory Winston [mailto:rory.winston@gmail.com] 
> Gesendet: Mittwoch, 3. Dezember 2008 23:12
> An: Commons Users List
> Betreff: Re: NNTP Client -> identical headers?
>
> Hi Daniel
>
> I'm not quite sure what the problem is here - are you saying that the 
> header values are incorrect for large article retrievals?
>
> Rory
>
> Daniel Penning wrote:
>   
>> Hello,
>>
>>  
>>
>> I am trying to implement a nntp header parser with Jakarta Commons Net.
>> The authentication and receiving of groups works well but I got strange
>> results when I am trying to download a huge number (100.000) of article
>> HEADER in a binary group.
>>
>> There are a lot of headers (approx. 10-20% which are identical in
>> subject, author and group. Only the size / header number / messageID
>> differs somewhat.
>>
>> Why could this happen?
>>
>>  
>>
>> My source code to receive the header: (client is an open and authed
>> nntp-Client)
>>
>>  
>>
>>                   ArrayList<Header> headers = new ArrayList<Header>();
>>
>>                   DotTerminatedMessageReader reader =
>> (DotTerminatedMessageReader)
>>
>>                         client.retrieveArticleInfo((int)rangeFrom,
>> (int)rangeTo);  
>>
>>                   BufferedReader stringReader = new
>> BufferedReader(reader);
>>
>>                   String line = null;
>>
>>                   String[] header = new String[6];
>>
>>                   while((line = stringReader.readLine()) != null)
>>
>>                   {
>>
>>                         try
>>
>>                         {
>>
>>                              header = line.split("\t");
>>
>>                              int number = Integer.parseInt(header[0]);
>>
>>                              String subject = header[1];
>>
>>                              String author = header[2];
>>
>>                              Date date = this.parseDate(header[3]);
>>
>>                              String messageID = header[4];
>>
>>                              int size = Integer.parseInt(header[6]);
>>
>>                              headers.add(new Header(number, subject,
>> author, date.getTime()/1000, size, messageID, parser));
>>
>>                         }
>>
>>                         catch (Exception e)
>>
>>                         {
>>
>>                              // exception handling...
>>
>>                         }
>>
>>                   }
>>
>>  
>>
>> Now I am sorting my header-objects and finally got a lot of duplicates,
>> as described.
>>
>> Is there any mistake in my code?
>>
>>  
>>
>> Kind regards, D.Penning
>>
>>  
>>
>>  
>>
>>
>>   
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: AW: NNTP Client -> identical headers?

Posted by Rory Winston <ro...@gmail.com>.
Hi Daniel

Your method looks correct - you are reading a bunch of tab-delimited 
lines and parsing the header values from that, which is what the 
NNTPUtils class does under the hood. You may find it easier to use the 
NNTPUtils::getArticleInfo() method to retrieve the information that you 
are currently parsing manually. I should move that class from the 
examples package to make that easier.


Possibly the reason for so much duplication is (if it is a binary 
newsgroup) articles being posted by an automated process.

Rory

Daniel Penning wrote:
> Hi Rory,
>
> I'm not sure if the header values are correct ;) I'm only wondering why there are so much headers which are - as described - nearly identical in their values.
> Is my way to parse the Stream into different headers a common way to do this job?
>
> Thanks a lot, Daniel
>
> -----Ursprüngliche Nachricht-----
> Von: Rory Winston [mailto:rory.winston@gmail.com] 
> Gesendet: Mittwoch, 3. Dezember 2008 23:12
> An: Commons Users List
> Betreff: Re: NNTP Client -> identical headers?
>
> Hi Daniel
>
> I'm not quite sure what the problem is here - are you saying that the 
> header values are incorrect for large article retrievals?
>
> Rory
>
> Daniel Penning wrote:
>   
>> Hello,
>>
>>  
>>
>> I am trying to implement a nntp header parser with Jakarta Commons Net.
>> The authentication and receiving of groups works well but I got strange
>> results when I am trying to download a huge number (100.000) of article
>> HEADER in a binary group.
>>
>> There are a lot of headers (approx. 10-20% which are identical in
>> subject, author and group. Only the size / header number / messageID
>> differs somewhat.
>>
>> Why could this happen?
>>
>>  
>>
>> My source code to receive the header: (client is an open and authed
>> nntp-Client)
>>
>>  
>>
>>                   ArrayList<Header> headers = new ArrayList<Header>();
>>
>>                   DotTerminatedMessageReader reader =
>> (DotTerminatedMessageReader)
>>
>>                         client.retrieveArticleInfo((int)rangeFrom,
>> (int)rangeTo);  
>>
>>                   BufferedReader stringReader = new
>> BufferedReader(reader);
>>
>>                   String line = null;
>>
>>                   String[] header = new String[6];
>>
>>                   while((line = stringReader.readLine()) != null)
>>
>>                   {
>>
>>                         try
>>
>>                         {
>>
>>                              header = line.split("\t");
>>
>>                              int number = Integer.parseInt(header[0]);
>>
>>                              String subject = header[1];
>>
>>                              String author = header[2];
>>
>>                              Date date = this.parseDate(header[3]);
>>
>>                              String messageID = header[4];
>>
>>                              int size = Integer.parseInt(header[6]);
>>
>>                              headers.add(new Header(number, subject,
>> author, date.getTime()/1000, size, messageID, parser));
>>
>>                         }
>>
>>                         catch (Exception e)
>>
>>                         {
>>
>>                              // exception handling...
>>
>>                         }
>>
>>                   }
>>
>>  
>>
>> Now I am sorting my header-objects and finally got a lot of duplicates,
>> as described.
>>
>> Is there any mistake in my code?
>>
>>  
>>
>> Kind regards, D.Penning
>>
>>  
>>
>>  
>>
>>
>>   
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


AW: NNTP Client -> identical headers?

Posted by Daniel Penning <d....@fire-development.com>.
Hi Rory,

I'm not sure if the header values are correct ;) I'm only wondering why there are so much headers which are - as described - nearly identical in their values.
Is my way to parse the Stream into different headers a common way to do this job?

Thanks a lot, Daniel

-----Ursprüngliche Nachricht-----
Von: Rory Winston [mailto:rory.winston@gmail.com] 
Gesendet: Mittwoch, 3. Dezember 2008 23:12
An: Commons Users List
Betreff: Re: NNTP Client -> identical headers?

Hi Daniel

I'm not quite sure what the problem is here - are you saying that the 
header values are incorrect for large article retrievals?

Rory

Daniel Penning wrote:
> Hello,
>
>  
>
> I am trying to implement a nntp header parser with Jakarta Commons Net.
> The authentication and receiving of groups works well but I got strange
> results when I am trying to download a huge number (100.000) of article
> HEADER in a binary group.
>
> There are a lot of headers (approx. 10-20% which are identical in
> subject, author and group. Only the size / header number / messageID
> differs somewhat.
>
> Why could this happen?
>
>  
>
> My source code to receive the header: (client is an open and authed
> nntp-Client)
>
>  
>
>                   ArrayList<Header> headers = new ArrayList<Header>();
>
>                   DotTerminatedMessageReader reader =
> (DotTerminatedMessageReader)
>
>                         client.retrieveArticleInfo((int)rangeFrom,
> (int)rangeTo);  
>
>                   BufferedReader stringReader = new
> BufferedReader(reader);
>
>                   String line = null;
>
>                   String[] header = new String[6];
>
>                   while((line = stringReader.readLine()) != null)
>
>                   {
>
>                         try
>
>                         {
>
>                              header = line.split("\t");
>
>                              int number = Integer.parseInt(header[0]);
>
>                              String subject = header[1];
>
>                              String author = header[2];
>
>                              Date date = this.parseDate(header[3]);
>
>                              String messageID = header[4];
>
>                              int size = Integer.parseInt(header[6]);
>
>                              headers.add(new Header(number, subject,
> author, date.getTime()/1000, size, messageID, parser));
>
>                         }
>
>                         catch (Exception e)
>
>                         {
>
>                              // exception handling...
>
>                         }
>
>                   }
>
>  
>
> Now I am sorting my header-objects and finally got a lot of duplicates,
> as described.
>
> Is there any mistake in my code?
>
>  
>
> Kind regards, D.Penning
>
>  
>
>  
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: NNTP Client -> identical headers?

Posted by Rory Winston <ro...@gmail.com>.
Hi Daniel

I'm not quite sure what the problem is here - are you saying that the 
header values are incorrect for large article retrievals?

Rory

Daniel Penning wrote:
> Hello,
>
>  
>
> I am trying to implement a nntp header parser with Jakarta Commons Net.
> The authentication and receiving of groups works well but I got strange
> results when I am trying to download a huge number (100.000) of article
> HEADER in a binary group.
>
> There are a lot of headers (approx. 10-20% which are identical in
> subject, author and group. Only the size / header number / messageID
> differs somewhat.
>
> Why could this happen?
>
>  
>
> My source code to receive the header: (client is an open and authed
> nntp-Client)
>
>  
>
>                   ArrayList<Header> headers = new ArrayList<Header>();
>
>                   DotTerminatedMessageReader reader =
> (DotTerminatedMessageReader)
>
>                         client.retrieveArticleInfo((int)rangeFrom,
> (int)rangeTo);  
>
>                   BufferedReader stringReader = new
> BufferedReader(reader);
>
>                   String line = null;
>
>                   String[] header = new String[6];
>
>                   while((line = stringReader.readLine()) != null)
>
>                   {
>
>                         try
>
>                         {
>
>                              header = line.split("\t");
>
>                              int number = Integer.parseInt(header[0]);
>
>                              String subject = header[1];
>
>                              String author = header[2];
>
>                              Date date = this.parseDate(header[3]);
>
>                              String messageID = header[4];
>
>                              int size = Integer.parseInt(header[6]);
>
>                              headers.add(new Header(number, subject,
> author, date.getTime()/1000, size, messageID, parser));
>
>                         }
>
>                         catch (Exception e)
>
>                         {
>
>                              // exception handling...
>
>                         }
>
>                   }
>
>  
>
> Now I am sorting my header-objects and finally got a lot of duplicates,
> as described.
>
> Is there any mistake in my code?
>
>  
>
> Kind regards, D.Penning
>
>  
>
>  
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org