For Programmers: Free Programming Magazines  


Home > Archive > Unix Programming > January 2008 > Maximum copy speed on a full duplex ethernet connection?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Maximum copy speed on a full duplex ethernet connection?
David Mathog

2008-01-17, 7:14 pm

On a 100baseT ethernet connection with MTU 1500 and Full Duplex set (on
Mandriva 2007.1, 2.6.19.3 kernel, if it matters) I have code which
can do:

Send -> Read (full speed node A to B)
Send -> Read (full speed node B to C)
but
Send -> Read (A to B)
V echo V
Send -> Read (B to C, total chain, not full speed)

Where "full speed" is around 11.7 Mbytes/sec and "not full speed" is a
lot less, at best a bit below 8 Mbytes/sec. So far nothing has been
better than the simplest possible method, which was read as much as the
input socket buffer will return in one pass, send all of that to the
output socket buffer, and repeat. The internal code is fast enough that
the center node B can empty the input buffer at the MTU size. That is,
it can read the input buffer, write it to the output buffer, and get
back to the input buffer fast enough so that the input buffer holds only
a single packet of data 1448, which is 1500 minues the header).
At least most of the time it can. Presumably it is writing at the same
rate since in the first test there was no problem writing at "full
speed". The odd thing is that every so often it stutters a bit, and I'm
guessing that that is where the slow down occurs. Here is a log of
the read (rrrn)/write (wrnn) sizes from the input and output buffers on
the central node B:

DEBUG rrnn returns 1448
DEBUG wrnn returns 1448
DEBUG rrnn returns 1448
DEBUG wrnn returns 1448
DEBUG rrnn returns 1448
DEBUG wrnn returns 1448
DEBUG rrnn returns 1448
DEBUG wrnn returns 1448
DEBUG rrnn returns 15928 <-- we sat for 10 packets
DEBUG wrnn returns 15928 <-- no problem writing all to out
DEBUG rrnn returns 1448
DEBUG wrnn returns 1448
DEBUG rrnn returns 1448
DEBUG wrnn returns 1448

There are very few of these stutter points on the A->B test (for
instance), and when they did show up the input buffer held only 2*1448
bytes. So the stutter was at most one packet interval long when running
in that mode. The stutter isn't due to something else on the system or
both tests would show it. There's something about copying that seems to
trigger this.

The sizes of the socket buffers are:

recv: 196608
send: 87380

I know the input buffer isn't filling, and suspect that the output
buffer is never full either.

Here's what ifconfig shows on the center node:

eth0 Link encap:Ethernet HWaddr 00:E0:81:22:CC:3D
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::2e0:81ff:fe22:cc3d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:35461150 errors:0 dropped:0 overruns:24943 frame:0
TX packets:8000252 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4283086317 (3.9 GiB) TX bytes:1964265393 (1.8 GiB)
Interrupt:16

So who an tell me what is causing this "stutter" and how to eliminate
it, so that the input -> echo -> output rate can be improved a bit?

Thanks,

David Mathog
Spoon

2008-01-22, 8:22 am

David Mathog wrote:

> On a 100baseT ethernet connection with MTU 1500 and Full Duplex set (on
> Mandriva 2007.1, 2.6.19.3 kernel, if it matters) I have code which
> can do:
>
> Send -> Read (full speed node A to B)
> Send -> Read (full speed node B to C)
> but
> Send -> Read (A to B)
> V echo V
> Send -> Read (B to C, total chain, not full speed)
>
> Where "full speed" is around 11.7 Mbytes/sec and "not full speed" is a
> lot less, at best a bit below 8 Mbytes/sec. So far nothing has been
> better than the simplest possible method, which was read as much as the
> input socket buffer will return in one pass, send all of that to the
> output socket buffer, and repeat. The internal code is fast enough that
> the center node B can empty the input buffer at the MTU size. That is,
> it can read the input buffer, write it to the output buffer, and get
> back to the input buffer fast enough so that the input buffer holds only
> a single packet of data 1448, which is 1500 minues the header).
> At least most of the time it can. Presumably it is writing at the same
> rate since in the first test there was no problem writing at "full
> speed". The odd thing is that every so often it stutters a bit, and I'm
> guessing that that is where the slow down occurs. Here is a log of
> the read (rrrn)/write (wrnn) sizes from the input and output buffers on
> the central node B:
>
> DEBUG rrnn returns 1448
> DEBUG wrnn returns 1448
> DEBUG rrnn returns 1448
> DEBUG wrnn returns 1448
> DEBUG rrnn returns 1448
> DEBUG wrnn returns 1448
> DEBUG rrnn returns 1448
> DEBUG wrnn returns 1448
> DEBUG rrnn returns 15928 <-- we sat for 10 packets
> DEBUG wrnn returns 15928 <-- no problem writing all to out
> DEBUG rrnn returns 1448
> DEBUG wrnn returns 1448
> DEBUG rrnn returns 1448
> DEBUG wrnn returns 1448
>
> There are very few of these stutter points on the A->B test (for
> instance), and when they did show up the input buffer held only 2*1448
> bytes. So the stutter was at most one packet interval long when running
> in that mode. The stutter isn't due to something else on the system or
> both tests would show it. There's something about copying that seems to
> trigger this.
>
> The sizes of the socket buffers are:
>
> recv: 196608
> send: 87380
>
> I know the input buffer isn't filling, and suspect that the output
> buffer is never full either.
>
> Here's what ifconfig shows on the center node:
>
> eth0 Link encap:Ethernet HWaddr 00:E0:81:22:CC:3D
> inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
> inet6 addr: fe80::2e0:81ff:fe22:cc3d/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:35461150 errors:0 dropped:0 overruns:24943 frame:0
> TX packets:8000252 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:4283086317 (3.9 GiB) TX bytes:1964265393 (1.8 GiB)
> Interrupt:16
>
> So who can tell me what is causing this "stutter" and how to eliminate
> it, so that the input -> echo -> output rate can be improved a bit?


Does this still happen if you change the scheduling policy to SCHED_RR?
(Warning: you might hang the box if you're not careful.)
David Mathog

2008-01-22, 7:27 pm

Spoon wrote:
> David Mathog wrote:
>
>
> Does this still happen if you change the scheduling policy to SCHED_RR?
> (Warning: you might hang the box if you're not careful.)


Thanks for the suggestion but I'm not going to try that - this has to
run for a normal process.

I sort of found the problem. The program had a parameter minwrite which
controlled how much data had to accumulate before it was written in bulk
to the output socket. Setting minwrite up to 9 X MTU size achieved full
throughput (and peaked there, it decreased above that). This suggested
that TCP_NODELAY might be set on the data sockets, instead of only on
the message sockets, and that turned out to be the case. Turning
TCP_NODELAY off on the data sockets eliminated the need for fine tuning
of the minwrite variable.

My guess is that on a chain A->B->C the slowdown seen at B has something
to do with ACKs coming back from C interfering with data coming from A.
The odd thing is, by looking at the "packets" values in ifconfig,
neither RX nor TX were very different when the program was working well
than when it was working poorly. The only thing that seemed to be
consistent is that when the program was working well the RX and TX
packet numbers were (slightly) closer to each other than when it was
slower. For instance, when the transfer rate was 8.6Mb/sec
(TCP_NODELAY set, minwrite=1) the RX/TX counts were 41887/42441,
but when TCP_NODELAY was off and/or minwrite set to 9 MTUs, so that the
speed maxed out at around 11.4Mb/sec this ratio was 41744/41822. The TX
numbers varied quite a bit and with no apparent relation to speed, and
it is hard to believe that 143 RX packets out of 41884 would make such a
huge speed difference, so I think maybe something else is going on. For
instance, that node C may be clustering ACKs into one packet, and
perhaps ifconfig shows each of those ACKs as a packet, even though they
really all come in together in a single packet. That is, perhaps the
actual RX packet count is somewhat lower when things are working well
than ifconfig displays.

Or maybe the issue is in the ethernet switch and has little or nothing
to do with the nodes.

Regards,

David Mathog

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com