Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, I'm trying to communicate with a given server via HTTP. I'm sending the request and receiving the response. But there is one problem... how do I know how much data to expect? A solution that came to mind is to use non-blocking I/O. I tried that, but read() or recv() returns before any data is read, which is no use. How can I read the entire response from the server without knowing how much data to expect? Thanks Daniel
Post Follow-up to this messageGigi <dandago@gmail.com> writes: > I'm trying to communicate with a given server via HTTP. I'm sending > the request and receiving the response. But there is one problem... > how do I know how much data to expect? > > A solution that came to mind is to use non-blocking I/O. I tried that, > but read() or recv() returns before any data is read, which is no use. > > How can I read the entire response from the server without knowing how > much data to expect? Not at all. The amount of data to expect is either implicit in the response type (eg 204 or 304 responses), implicitly signalled by the server closing the connection, explicitly given by a Content-Length-header or marked by a zero-sized 'chunk of response data' (chunked transfer-coding). If you don't need persistent connections, using HTTP/1.0 and always closing a connection [shutdown(fd, SHUT_WR)] after a single request has been sent 'should' cause the server to always use the 'closing the connection' method. Especially, you wouldn't have to deal with chunked transfer-coding then.
Post Follow-up to this messageOn Mar 25, 7:01 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote: > Gigi <dand...@gmail.com> writes: > > > > Not at all. The amount of data to expect is either implicit in the > response type (eg 204 or 304 responses), implicitly signalled by the > server closing the connection, explicitly given by a > Content-Length-header or marked by a zero-sized 'chunk of response > data' (chunked transfer-coding). > > If you don't need persistent connections, using HTTP/1.0 and always > closing a connection [shutdown(fd, SHUT_WR)] after a single request > has been sent 'should' cause the server to always use the 'closing the > connection' method. Especially, you wouldn't have to deal with chunked > transfer-coding then. Hi, Thanks for the reply, but I didn't quite understand what I need to do to get things working. As in, if the amount of data to expect is somehow sent by the server, how do I access this amount? Right now I'm setting my socket descriptor to non-blocking and attempting to recv(), but it's failing with EAGAIN (Resource temporarily unavailable) and not receiving any data. On the other hand I can use blocking I/O, but this brings me back to the problem that I don't know how much data to expect.
Post Follow-up to this message> Thanks for the reply, but I didn't quite understand what I need to do > to get things working. As in, if the amount of data to expect is > somehow sent by the server, how do I access this amount? As Rainer said, you should fetch it from the 'Content-Length' header, from the http response. this http header tells you how much data is expected after parsing the Http headers. So you should: 1- read the htpp headers i.e. when receiving data keep on reading the stream until you reach "\r\n\r\n" - this is the header termination char sequence. 2- parse the headers received an look for Content-Length, extract the length from it 3- resume reading as much bytes as parsed from the header. The other method described by Rainer is: 0- specify HTTP/1.0 as your protocol when connecting to the http server 1- read the http headers i.e. when receiving data keep on reading the stream until you reach "\r\n\r\n" - this is the header termination char sequence. 2- resume reading all bytes sent by the server until it closes the connection of course you should get the http rfc numbered 2616, it deals with headers, parsing etc. > > Right now I'm setting my socket descriptor to non-blocking and > attempting to recv(), but it's failing with EAGAIN (Resource > temporarily unavailable) and not receiving any data. On the other hand > I can use blocking I/O, but this brings me back to the problem that I > don't know how much data to expect. async / sync I/O has nothing to do with your initial problem: it is a different way to process incoming data and nothing else. A byte stream (like TCP) is a byte stream: it does not convey any information regarding messages size, separators etc. As such do not expect to find message boundaries/size using the socket API (well unless you use UDP, but that s a different story.). Http, running on top of TCP, gives you these infos in the header or by closing the connection. -- paulo
Post Follow-up to this messageGigi <dandago@gmail.com> writes: >Thanks for the reply, but I didn't quite understand what I need to do >to get things working. As in, if the amount of data to expect is >somehow sent by the server, how do I access this amount? Parse the HTTP headers. Look for the Content-Length: header. c.f. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html scott
Post Follow-up to this messageOK regarding content-length I got the idea... but that only gives you how much data there is in the payload. First you need to know how much data is in the header. What I've been doing so far was call recv() and giving it a certain number... an expected size for the whole data (header + payload). So from what you're saying I suppose I have to call recv() repeatedly with 1 as the size to receive? (i.e. receive one byte at a time until \r\n\r\n is reached) I don't know if I'm understanding you correctly, but if I am, isn't this inefficient? I mean, you're calling a system call for every byte, which I suppose causes considerable overhead because of continuous context switching.
Post Follow-up to this messageOn Mar 25, 3:26 pm, Gigi <dand...@gmail.com> wrote: > What I've been doing so far was call recv() and giving it a certain > number... an expected size for the whole data (header + payload). So > from what you're saying I suppose I have to call recv() repeatedly > with 1 as the size to receive? (i.e. receive one byte at a time until > \r\n\r\n is reached) > I don't know if I'm understanding you correctly, but if I am, isn't > this inefficient? I mean, you're calling a system call for every byte, > which I suppose causes considerable overhead because of continuous > context switching. That would be horrible, don't do that. Use the following logic: 1) Call 'recv' passing it a reasonable amount of bytes to get. 2) See if you have a complete header, by searching for a '\r\n\r\n' in the data you got. (Caution, do not use 'strstr' until/unless you zero- terminate the data!) 3) If you don't have a complete header, you definitely don't have a complete request. 4) If you do have a complete header, check if you you have all the data too. If not, go to 1. DS
Post Follow-up to this messageOn Mar 25, 11:31 pm, David Schwartz <dav...@webmaster.com> wrote: > On Mar 25, 3:26 pm, Gigi <dand...@gmail.com> wrote: > > > That would be horrible, don't do that. Use the following logic: > > 1) Call 'recv' passing it a reasonable amount of bytes to get. > > 2) See if you have a complete header, by searching for a '\r\n\r\n' in > the data you got. (Caution, do not use 'strstr' until/unless you zero- > terminate the data!) > > 3) If you don't have a complete header, you definitely don't have a > complete request. > > 4) If you do have a complete header, check if you you have all the > data too. If not, go to 1. > > DS OK, so let's say I receive 8 bytes, and the \r\n\r\n is at the beginning of it. Let's assume that the payload is less than 4 bytes long (for example, when issuing a HEAD request). Then I am expecting to read more data than I will actually receive, which means recv() will block. Another problem is that the \r\n\r\n can be split over two 8-byte fragments, but that's not really a big deal... a bit of extra code will cater for this situation. Of course, your method will work perfectly for a response with a decent payload. But if it can work universally, even for HEAD requests, then it would be so much better.
Post Follow-up to this messageIn article <60793e0f-7ee8-4111-ba5d-7c3b7f2b237f@s8g2000prg.googlegroups.com>, Gigi <dandago@gmail.com> wrote: > OK, so let's say I receive 8 bytes, and the \r\n\r\n is at the > beginning of it. Let's assume that the payload is less than 4 bytes > long (for example, when issuing a HEAD request). Then I am expecting > to read more data than I will actually receive, which means recv() > will block. Recv() only blocks if there's NOTHING available. If you give a 1000-byte buffer, and the server sends 40 bytes, recv() will return those 40 bytes immediately and you won't block. -- Barry Margolin, barmar@alum.mit.edu Arlington, MA *** PLEASE don't copy me on replies, I'll read them in the group ***
Post Follow-up to this messageOn Mar 25, 3:59 pm, Gigi <dand...@gmail.com> wrote: > OK, so let's say I receive 8 bytes, and the \r\n\r\n is at the > beginning of it. Let's assume that the payload is less than 4 bytes > long (for example, when issuing a HEAD request). Then I am expecting > to read more data than I will actually receive, which means recv() > will block. You can't receive 8 bytes unless the other end sends 8 bytes. If you mean the 'recv' will block trying to receive 8 bytes, no, it will not. The default behavior is only to block if no data is available. > Another problem is that the \r\n\r\n can be split over two 8-byte > fragments, but that's not really a big deal... a bit of extra code > will cater for this situation. I'm not sure what you mean. You can always get data broken up any which way from a TCP connection, that's the nature of TCP. You might get a single byte with just a '\r' and then a single byte with just a '\n'. TCP is a byte-stream protocol. > Of course, your method will work perfectly for a response with a > decent payload. But if it can work universally, even for HEAD > requests, then it would be so much better. I'm not sure I understand why you think it wouldn't. Is it just because you think 'recv' will normally block even if data is available? If so, what do you think MSG_WAITALL is for? DS
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.