For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > September 2006 > Read everything from socket









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Read everything from socket
aleko.petkov@gmail.com

2006-09-09, 3:57 am

Hi,

I want to read a bunch of lines from a socket. The problem is that the
getline method doesn't seem to detect the end of the stream; the loop
never returns. Here's the code I'm using:

my $line = ''
while( $line = $socket->getline )
{
$msg .= $line;
}

This reads the response from a POP3 server. If I loop a fixed number of
times, everything works, so the problem is detecting the EOF.

Any ideas?

Thanks,

Aleko

Aaron Dougherty

2006-09-09, 3:57 am

The problem is that getline is a blocking call, meaning it will sit
around and hold up the program (or fork or thread) until it has a line
to get, which means $line will always have a file, and the while will
never exit.

I'm not intimately familiar with the POP3 protocol, but if I'm not
mistaken, there are various ways with it to know when you have reached
the end of a stream. For example: The first line of a list command
indicates how many lines it will return, so you can break out of the
while when you know you got the last line, and the retr command uses a
period on a line by it's self to indicate the end of a line. So
something such as the following should work for you

# Reading from RETR
while (my $line = $socket->getline){
last if $line=~/^\.$/;
$msg.=$line;
}

# Reading from LIST
if ($socket->getline =~ /^\+OK\s(\d+) messages/){
$message_count = $1;
for (1..$message_count){
$msg = $socket->getline;
}
}

aleko.petkov@gmail.com wrote:
> Hi,
>
> I want to read a bunch of lines from a socket. The problem is that the
> getline method doesn't seem to detect the end of the stream; the loop
> never returns. Here's the code I'm using:
>
> my $line = ''
> while( $line = $socket->getline )
> {
> $msg .= $line;
> }
>
> This reads the response from a POP3 server. If I loop a fixed number of
> times, everything works, so the problem is detecting the EOF.
>
> Any ideas?
>
> Thanks,
>
> Aleko


aleko.petkov@gmail.com

2006-09-11, 6:57 pm


Aaron Dougherty wrote:[color=darkred]
> The problem is that getline is a blocking call, meaning it will sit
> around and hold up the program (or fork or thread) until it has a line
> to get, which means $line will always have a file, and the while will
> never exit.
>
> I'm not intimately familiar with the POP3 protocol, but if I'm not
> mistaken, there are various ways with it to know when you have reached
> the end of a stream. For example: The first line of a list command
> indicates how many lines it will return, so you can break out of the
> while when you know you got the last line, and the retr command uses a
> period on a line by it's self to indicate the end of a line. So
> something such as the following should work for you
>
> # Reading from RETR
> while (my $line = $socket->getline){
> last if $line=~/^\.$/;
> $msg.=$line;
> }
>
> # Reading from LIST
> if ($socket->getline =~ /^\+OK\s(\d+) messages/){
> $message_count = $1;
> for (1..$message_count){
> $msg = $socket->getline;
> }
> }
>
> aleko.petkov@gmail.com wrote:

Thanks Aaron,

That's basically the approach taken by the module I'm trying to work
with (Mail::Pop3Client); it reads until it hits a specific delimiter.

The problem I'm running into is that it doesn't handle malformed
messages properly, e.g. messages with '\n.\n' embedded. The script
stops parsing the body prematurely, and subsequent POP commands fail or
get the remainder of the body in response.

If I could just get the whole RETR response in a string I could parse
that easily. I just don't know how to detect the on of the response
stream.

Aaron Dougherty

2006-09-11, 9:57 pm

My pleasure. Unfortunately you're running into one of the limitations
of TCP/IP. There is no standard way to indicate a stream is finished
short of closing the connection (which Perl will handle properly).

Because of this protocols typically have ways to detect it yourself.
What I would suggest in your example is to parse the first response
line from the RETR command

+OK 1234 octets

Or from the LIST command

+OK 2 messages (1801)
1 1234
2 567

In both cases, the actual byte length is given of the response steam.

There are two ways to read a specific byte amount, you can either use
the read function
http://perldoc.perl.org/functions/read.html

This is probably the best method.

if ($retr=~/^\+OK\s(\d+)\soctets$){
my $total_size = $1;
my $bytes_read = 0;
my $response = '';
while ($bytes_read < $total_size){
$socket->read($data, 1024);
$response.=$data;
$bytes_read+=length($data);
}
}

The other way you could do it is with the same readline command, and
use length like in the previous example to count how many bytes you
have read in so far. Once you get the expected response length, you
know the stream is complete.

Hope that helps :)

-Aaron

aleko.petkov@gmail.com wrote:
> Aaron Dougherty wrote:
>
> Thanks Aaron,
>
> That's basically the approach taken by the module I'm trying to work
> with (Mail::Pop3Client); it reads until it hits a specific delimiter.
>
> The problem I'm running into is that it doesn't handle malformed
> messages properly, e.g. messages with '\n.\n' embedded. The script
> stops parsing the body prematurely, and subsequent POP commands fail or
> get the remainder of the body in response.
>
> If I could just get the whole RETR response in a string I could parse
> that easily. I just don't know how to detect the on of the response
> stream.


aleko.petkov@gmail.com

2006-09-11, 9:57 pm


Aaron Dougherty wrote:[color=darkred]
> My pleasure. Unfortunately you're running into one of the limitations
> of TCP/IP. There is no standard way to indicate a stream is finished
> short of closing the connection (which Perl will handle properly).
>
> Because of this protocols typically have ways to detect it yourself.
> What I would suggest in your example is to parse the first response
> line from the RETR command
>
> +OK 1234 octets
>
> Or from the LIST command
>
> +OK 2 messages (1801)
> 1 1234
> 2 567
>
> In both cases, the actual byte length is given of the response steam.
>
> There are two ways to read a specific byte amount, you can either use
> the read function
> http://perldoc.perl.org/functions/read.html
>
> This is probably the best method.
>
> if ($retr=~/^\+OK\s(\d+)\soctets$){
> my $total_size = $1;
> my $bytes_read = 0;
> my $response = '';
> while ($bytes_read < $total_size){
> $socket->read($data, 1024);
> $response.=$data;
> $bytes_read+=length($data);
> }
> }
>
> The other way you could do it is with the same readline command, and
> use length like in the previous example to count how many bytes you
> have read in so far. Once you get the expected response length, you
> know the stream is complete.
>
> Hope that helps :)
>
> -Aaron
>
> aleko.petkov@gmail.com wrote:

Perfect! I ended up using method #1, but I had to reduce the buffer
size to 1 because with anything greater the socket just blocks after a
few reads, waiting for data. This probably slows things down, but not
noticably, so I'll live with it for now.

Also, I had to add an extra pair of of getline()'s after the loop to
take care of the NL.NL , which the POP3 server appends to the end of
the message.

So, here's my addition to the Pop3Client module, that safely retrieves
raw messages, even if they are malformed:

sub RetrieveRaw
{
my $me = shift;
my $num = shift;
my $msg = '';

$me->_checkstate('TRANSACTION', 'RETR') or return;
$me->_sockprint( "RETR $num", $me->EOL );
my $line = $me->_sockread();
unless (defined $line) {
$me->Message("Socket read failed for RETR");
return;
}

# read response & figure out how much data to expect
#chomp $line;

$line =~ /^\+OK\s(\d+)\soctets/ or $me->Message("Bad return from RETR:
$line") and return;
my $total_bytes = $1;
my $bytes_read = 0;
my $data = '';

# read message text
while( $bytes_read < $total_bytes )
{
$me->Socket()->read( $data, 1 ); # TODO: use a larger buffer size
(but without blocking)
$msg .= $data;
$bytes_read += length($data);
}

# now consume EOF marker (\n.\n)
$me->Socket()->getline;
$me->Socket()->getline;

return $msg;

}

This fetches the entire message, including headers, body, and
attachments. You can then split it using something like this:

my $divider_pos = index($headandbody, "\r\n\r\n");
if ($divider_pos == 0)
{
$divider_pos = index($headandbody, "\n\n");
}

if ($divider_pos > 0)
{
$head = substr($headandbody, 0, $divider_pos);
$body = substr($headandbody, $divider_pos+4);
}

Hope this benefits someone :)

Thanks again, Aaron.

-Aleko

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com