netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TCP sender stuck in persist despite peer advertising non-zero window
@ 2020-10-15 18:23 Apollon Oikonomopoulos
  2020-10-15 20:22 ` Neal Cardwell
  0 siblings, 1 reply; 11+ messages in thread
From: Apollon Oikonomopoulos @ 2020-10-15 18:23 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 6675 bytes --]

Hi,

I'm trying to debug a (possible) TCP issue we have been encountering
sporadically during the past couple of years. Currently we're running
4.9.144, but we've been observing this since at least 3.16.

Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly
updated, leading to inability to recover from a TCP persist state and
would appreciate some help debugging this.

The long version:

The issue manifests with the client → server direction of an rsync
pipeline being stuck in TCP persist even though the server actually
advertises a non-zero window. The stall is not readily reproducible,
although it happens quite often (with a ~10% probability I'd say) when a
cluster of 4 machines tries to rsync an 800GB dataset from a single
server at the same time.

For those not familiar with the way rsync works, it essentially creates
a self-throttling, blocking pipeline using both directions of a single
TCP stream to connect the stages:

                 C               S              C
              generator -----> sender -----> receiver
                          A             A'

      [C: Client, S: Server, A & A': TCP stream directions]

Direction A carries file checksum data for the sender to decide what to
send, and A' carries file data for the receiver to write to disk. It's
always A that ends up in persist mode, while A' works normally. When the
zero-window condition hits, eventually the whole transfer stalls because
the generator does not send out metadata and the server has nothing more
to process and send to the receiver. When this happens, the socket on C
looks like this:

 $ ss -mito dst :873

 State      Recv-Q Send-Q                Local Address:Port                                 Peer Address:Port                
 ESTAB      0      392827               2001:db8:2a::3:38022                             2001:db8:2a::18:rsync                 timer:(persist,1min56sec,0)
	 skmem:(r0,rb4194304,t0,tb530944,f3733,w401771,o0,bl0,d757) ts sack cubic wscale:7,7 rto:204 backoff:15 rtt:2.06/0.541 ato:40 mss:1428 cwnd:10 ssthresh:46 bytes_acked:22924107 bytes_received:100439119971 segs_out:7191833 segs_in:70503044 data_segs_out:16161 data_segs_in:70502223 send 55.5Mbps lastsnd:16281856 lastrcv:14261988 lastack:3164 pacing_rate 133.1Mbps retrans:0/11 rcv_rtt:20 rcv_space:2107888 notsent:392827 minrtt:0.189


while the socket on S looks like this:

 $ ss -mito src :873

 State      Recv-Q Send-Q                Local Address:Port                                 Peer Address:Port                
 ESTAB      0      0                   2001:db8:2a::18:rsync                              2001:db8:2a::3:38022                 timer:(keepalive,3min7sec,0)
  	 skmem:(r0,rb3540548,t0,tb4194304,f0,w0,o0,bl0,d292) ts sack cubic wscale:7,7 rto:204 rtt:1.234/1.809 ato:40 mss:1428 cwnd:1453 ssthresh:1431 bytes_acked:100439119971 bytes_received:22924106 segs_out:70503089 segs_in:7191833 data_segs_out:70502269 data_segs_in:16161 send 13451.4Mbps lastsnd:14277708 lastrcv:16297572 lastack:7012576 pacing_rate 16140.1Mbps retrans:0/794 rcv_rtt:7.5 rcv_space:589824 minrtt:0.026

There's a non-empty send queue on C, while S obviously has enough space
to accept new data. Also note the difference between lastsnd and lastrcv
on C. tcpdump reveals the ZWP exchange between C and S:

  […]
  09:34:34.165148 0c:c4:7a:f9:68:e4 > 0c:c4:7a:f9:69:78, ethertype IPv6 (0x86dd), length 86: (flowlabel 0xcbf6f, hlim 64, next-header TCP (6) payload length: 32) 2001:db8:2a::3.38022 > 2001:db8:2a::18.873: Flags [.], cksum 0x711b (incorrect -> 0x4d39), seq 4212361595, ack 1253278418, win 16384, options [nop,nop,TS val 2864739840 ecr 2885730760], length 0
  09:34:34.165354 0c:c4:7a:f9:69:78 > 0c:c4:7a:f9:68:e4, ethertype IPv6 (0x86dd), length 86: (flowlabel 0x25712, hlim 64, next-header TCP (6) payload length: 32) 2001:db8:2a::18.873 > 2001:db8:2a::3.38022: Flags [.], cksum 0x1914 (correct), seq 1253278418, ack 4212361596, win 13831, options [nop,nop,TS val 2885760967 ecr 2863021624], length 0
  [… repeats every 2 mins]

S responds with a non-zero window (13831 << 7), but C seems to never
pick it up. I dumped the internal connection state by hooking at the
bottom of tcp_ack using the attached systemtap script, which reveals the
following:

 ack: 4212361596, ack_seq: 1253278418, prior_snd_una: 4212361596
 sk_send_head seq:4212361596, end_seq: 4212425472
 snd_wnd: 0, tcp_wnd_end: 4212361596, snd_wl1: 1708927328
 flag: 4100, may update window: 0
 rcv_tsval: 2950255047, ts_recent: 2950255047

Everything seems to check out, apart from the (strange ?) fact that
ack_seq < snd_wl1 by some 450MB, which AFAICT leads
tcp_may_update_window() to reject the update:

  static inline bool tcp_may_update_window(const struct tcp_sock *tp,
                                          const u32 ack, const u32 ack_seq,
                                          const u32 nwin)
  {
          return  after(ack, tp->snd_una) ||
                  after(ack_seq, tp->snd_wl1) ||
                  (ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd);
  }

If I understand correctly, the only ways to recover from a zero window
in a case like this would be for ack_seq to equal snd_wl1, or
after(ack_seq, tp->snd_wl1) to be true, none of which holds in our case.

Overall it looks like snd_wl1 stopped advancing at some point and the
peer sequence numbers wrapped in the meantime as traffic in the opposite
direction continued. Every single of the 5 hung connections I've seen in
the past week is in this state, with ack_seq < snd_wl1. The problem is
that - at least to my eyes - there's no way snd_wl1 could *not* advance
when processing a valid ACK, so I'm really stuck here (or I'm completely
misled and talking nonsense :) Any ideas?

Some potentially useful details about the setup and the issue:

 - All machines currently run Debian Stretch with kernel 4.9.144; we
   have been seeing this since at least Linux 3.16.

 - We've witnessed the issue with different hardware (servers &
   NICs). Currently all NICs are igb, but we've had tg3 on both sides at
   some point and still experienced hangs. We tweaked TSO settings in
   the past and it didn't seem to make a difference.

 - It correlates with network congestion. We can never reproduce this
   with a single client, but it happens when all 4 machines try to rsync
   at the same time. Also limiting the bandwidth of the transfers from
   userspace makes the issue less frequent.

Regards,
Apollon

P.S: Please Cc me on any replies as I'm not subscribed to the list.


[-- Attachment #2: zwp.stp --]
[-- Type: text/plain, Size: 1049 bytes --]

probe kernel.statement("tcp_ack@./net/ipv4/tcp_input.c:3751")
{
  if ($sk->sk_send_head != NULL) {
	  ack_seq = @cast(&$skb->cb[0], "tcp_skb_cb", "kernel<net/tcp.h>")->seq
	  printf("ack: %d, ack_seq: %d, prior_snd_una: %d\n", $ack, ack_seq, $prior_snd_una)
	  seq = @cast(&$sk->sk_send_head->cb[0], "tcp_skb_cb", "kernel<net/tcp.h>")->seq
	  end_seq = @cast(&$sk->sk_send_head->cb[0], "tcp_skb_cb", "kernel<net/tcp.h>")->end_seq
	  printf("sk_send_head seq:%d, end_seq: %d\n", seq, end_seq)

	  snd_wnd = @cast($sk, "tcp_sock", "kernel<linux/tcp.h>")->snd_wnd
	  snd_wl1 = @cast($sk, "tcp_sock", "kernel<linux/tcp.h>")->snd_wl1
	  ts_recent = @cast($sk, "tcp_sock", "kernel<linux/tcp.h>")->rx_opt->ts_recent
	  rcv_tsval = @cast($sk, "tcp_sock", "kernel<linux/tcp.h>")->rx_opt->rcv_tsval
	  printf("snd_wnd: %d, tcp_wnd_end: %d, snd_wl1: %d\n", snd_wnd, $prior_snd_una + snd_wnd, snd_wl1)
	  printf("flag: %x, may update window: %d\n", $flag, $flag & 0x02)
	  printf("rcv_tsval: %d, ts_recent: %d\n", rcv_tsval, ts_recent)
	  print("\n")
     }
}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-15 18:23 TCP sender stuck in persist despite peer advertising non-zero window Apollon Oikonomopoulos
@ 2020-10-15 20:22 ` Neal Cardwell
  2020-10-15 20:27   ` Soheil Hassas Yeganeh
  2020-10-15 21:39   ` Yuchung Cheng
  0 siblings, 2 replies; 11+ messages in thread
From: Neal Cardwell @ 2020-10-15 20:22 UTC (permalink / raw)
  To: Apollon Oikonomopoulos
  Cc: Netdev, Eric Dumazet, Yuchung Cheng, Soheil Hassas Yeganeh

On Thu, Oct 15, 2020 at 2:31 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
>
> Hi,
>
> I'm trying to debug a (possible) TCP issue we have been encountering
> sporadically during the past couple of years. Currently we're running
> 4.9.144, but we've been observing this since at least 3.16.
>
> Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly
> updated, leading to inability to recover from a TCP persist state and
> would appreciate some help debugging this.

Thanks for the detailed report and diagnosis. I think we may need a
fix something like the following patch below.

Eric/Yuchung/Soheil, what do you think?

commit 42b37c72aa73aaabd0c01b8c05c2205236279021
Author: Neal Cardwell <ncardwell@google.com>
Date:   Thu Oct 15 16:06:11 2020 -0400

    tcp: fix to update snd_wl1 in bulk receiver fast path

    In the header prediction fast path for a bulk data receiver, if no
    data is newly acknowledged then we do not call tcp_ack() and do not
    call tcp_ack_update_window(). This means that a bulk receiver that
    receives large amounts of data can have the incoming sequence numbers
    wrap, so that the check in tcp_may_update_window fails:
       after(ack_seq, tp->snd_wl1)

    The fix is to update snd_wl1 in the header prediction fast path for a
    bulk data receiver, so that it keeps up and does not see wrapping
    problems.

    Signed-off-by: Neal Cardwell <ncardwell@google.com>
    Reported-By: Apollon Oikonomopoulos <apoikos@dmesg.gr>

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b1ce2054291d..75be97f6a7da 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5766,6 +5766,8 @@ void tcp_rcv_established(struct sock *sk, struct
sk_buff *skb)
                                tcp_data_snd_check(sk);
                                if (!inet_csk_ack_scheduled(sk))
                                        goto no_ack;
+                       } else {
+                               tcp_update_wl(tp, TCP_SKB_CB(skb)->seq);
                        }

                        __tcp_ack_snd_check(sk, 0);

neal

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-15 20:22 ` Neal Cardwell
@ 2020-10-15 20:27   ` Soheil Hassas Yeganeh
  2020-10-15 21:39   ` Yuchung Cheng
  1 sibling, 0 replies; 11+ messages in thread
From: Soheil Hassas Yeganeh @ 2020-10-15 20:27 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Apollon Oikonomopoulos, Netdev, Eric Dumazet, Yuchung Cheng

On Thu, Oct 15, 2020 at 4:22 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Thu, Oct 15, 2020 at 2:31 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
> >
> > Hi,
> >
> > I'm trying to debug a (possible) TCP issue we have been encountering
> > sporadically during the past couple of years. Currently we're running
> > 4.9.144, but we've been observing this since at least 3.16.
> >
> > Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly
> > updated, leading to inability to recover from a TCP persist state and
> > would appreciate some help debugging this.
>
> Thanks for the detailed report and diagnosis. I think we may need a
> fix something like the following patch below.
>
> Eric/Yuchung/Soheil, what do you think?
>
> commit 42b37c72aa73aaabd0c01b8c05c2205236279021
> Author: Neal Cardwell <ncardwell@google.com>
> Date:   Thu Oct 15 16:06:11 2020 -0400
>
>     tcp: fix to update snd_wl1 in bulk receiver fast path
>
>     In the header prediction fast path for a bulk data receiver, if no
>     data is newly acknowledged then we do not call tcp_ack() and do not
>     call tcp_ack_update_window(). This means that a bulk receiver that
>     receives large amounts of data can have the incoming sequence numbers
>     wrap, so that the check in tcp_may_update_window fails:
>        after(ack_seq, tp->snd_wl1)
>
>     The fix is to update snd_wl1 in the header prediction fast path for a
>     bulk data receiver, so that it keeps up and does not see wrapping
>     problems.
>
>     Signed-off-by: Neal Cardwell <ncardwell@google.com>
>     Reported-By: Apollon Oikonomopoulos <apoikos@dmesg.gr>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Looks great to me! thanks for the quick fix!

> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index b1ce2054291d..75be97f6a7da 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5766,6 +5766,8 @@ void tcp_rcv_established(struct sock *sk, struct
> sk_buff *skb)
>                                 tcp_data_snd_check(sk);
>                                 if (!inet_csk_ack_scheduled(sk))
>                                         goto no_ack;
> +                       } else {
> +                               tcp_update_wl(tp, TCP_SKB_CB(skb)->seq);
>                         }
>
>                         __tcp_ack_snd_check(sk, 0);
>
> neal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-15 20:22 ` Neal Cardwell
  2020-10-15 20:27   ` Soheil Hassas Yeganeh
@ 2020-10-15 21:39   ` Yuchung Cheng
  2020-10-15 22:12     ` Apollon Oikonomopoulos
  1 sibling, 1 reply; 11+ messages in thread
From: Yuchung Cheng @ 2020-10-15 21:39 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Apollon Oikonomopoulos, Netdev, Eric Dumazet, Soheil Hassas Yeganeh

On Thu, Oct 15, 2020 at 1:22 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Thu, Oct 15, 2020 at 2:31 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
> >
> > Hi,
> >
> > I'm trying to debug a (possible) TCP issue we have been encountering
> > sporadically during the past couple of years. Currently we're running
> > 4.9.144, but we've been observing this since at least 3.16.
> >
> > Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly
> > updated, leading to inability to recover from a TCP persist state and
> > would appreciate some help debugging this.
>
> Thanks for the detailed report and diagnosis. I think we may need a
> fix something like the following patch below.
>
> Eric/Yuchung/Soheil, what do you think?
wow hard to believe how old this bug can be. The patch looks good but
can Apollon verify this patch fix the issue?

>
> commit 42b37c72aa73aaabd0c01b8c05c2205236279021
> Author: Neal Cardwell <ncardwell@google.com>
> Date:   Thu Oct 15 16:06:11 2020 -0400
>
>     tcp: fix to update snd_wl1 in bulk receiver fast path
>
>     In the header prediction fast path for a bulk data receiver, if no
>     data is newly acknowledged then we do not call tcp_ack() and do not
>     call tcp_ack_update_window(). This means that a bulk receiver that
>     receives large amounts of data can have the incoming sequence numbers
>     wrap, so that the check in tcp_may_update_window fails:
>        after(ack_seq, tp->snd_wl1)
>
>     The fix is to update snd_wl1 in the header prediction fast path for a
>     bulk data receiver, so that it keeps up and does not see wrapping
>     problems.
>
>     Signed-off-by: Neal Cardwell <ncardwell@google.com>
>     Reported-By: Apollon Oikonomopoulos <apoikos@dmesg.gr>
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index b1ce2054291d..75be97f6a7da 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5766,6 +5766,8 @@ void tcp_rcv_established(struct sock *sk, struct
> sk_buff *skb)
>                                 tcp_data_snd_check(sk);
>                                 if (!inet_csk_ack_scheduled(sk))
>                                         goto no_ack;
> +                       } else {
> +                               tcp_update_wl(tp, TCP_SKB_CB(skb)->seq);
>                         }
>
>                         __tcp_ack_snd_check(sk, 0);
>
> neal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-15 21:39   ` Yuchung Cheng
@ 2020-10-15 22:12     ` Apollon Oikonomopoulos
  2020-10-15 22:37       ` Neal Cardwell
  0 siblings, 1 reply; 11+ messages in thread
From: Apollon Oikonomopoulos @ 2020-10-15 22:12 UTC (permalink / raw)
  To: Yuchung Cheng, Neal Cardwell; +Cc: Netdev, Eric Dumazet, Soheil Hassas Yeganeh

Yuchung Cheng <ycheng@google.com> writes:

> On Thu, Oct 15, 2020 at 1:22 PM Neal Cardwell <ncardwell@google.com> wrote:
>>
>> On Thu, Oct 15, 2020 at 2:31 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
>> >
>> > Hi,
>> >
>> > I'm trying to debug a (possible) TCP issue we have been encountering
>> > sporadically during the past couple of years. Currently we're running
>> > 4.9.144, but we've been observing this since at least 3.16.
>> >
>> > Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly
>> > updated, leading to inability to recover from a TCP persist state and
>> > would appreciate some help debugging this.
>>
>> Thanks for the detailed report and diagnosis. I think we may need a
>> fix something like the following patch below.

That was fast, thank you!

>>
>> Eric/Yuchung/Soheil, what do you think?
> wow hard to believe how old this bug can be. The patch looks good but
> can Apollon verify this patch fix the issue?

Sure, I can give it a try and let the systems do their thing for a couple of
days, which should be enough to see if it's fixed.

Neal, would it be possible to re-send the patch as an attachment? The
inlined version does not apply cleanly due to linewrapping and
whitespace changes and, although I can re-type it, I would prefer to test
the exact same thing that would be merged.

Regards,
Apollon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-15 22:12     ` Apollon Oikonomopoulos
@ 2020-10-15 22:37       ` Neal Cardwell
  2020-10-16  7:35         ` Eric Dumazet
  2020-10-16 16:57         ` Apollon Oikonomopoulos
  0 siblings, 2 replies; 11+ messages in thread
From: Neal Cardwell @ 2020-10-15 22:37 UTC (permalink / raw)
  To: Apollon Oikonomopoulos
  Cc: Yuchung Cheng, Netdev, Eric Dumazet, Soheil Hassas Yeganeh

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

On Thu, Oct 15, 2020 at 6:12 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
>
> Yuchung Cheng <ycheng@google.com> writes:
>
> > On Thu, Oct 15, 2020 at 1:22 PM Neal Cardwell <ncardwell@google.com> wrote:
> >>
> >> On Thu, Oct 15, 2020 at 2:31 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I'm trying to debug a (possible) TCP issue we have been encountering
> >> > sporadically during the past couple of years. Currently we're running
> >> > 4.9.144, but we've been observing this since at least 3.16.
> >> >
> >> > Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly
> >> > updated, leading to inability to recover from a TCP persist state and
> >> > would appreciate some help debugging this.
> >>
> >> Thanks for the detailed report and diagnosis. I think we may need a
> >> fix something like the following patch below.
>
> That was fast, thank you!
>
> >>
> >> Eric/Yuchung/Soheil, what do you think?
> > wow hard to believe how old this bug can be. The patch looks good but
> > can Apollon verify this patch fix the issue?
>
> Sure, I can give it a try and let the systems do their thing for a couple of
> days, which should be enough to see if it's fixed.

Great, thanks!

> Neal, would it be possible to re-send the patch as an attachment? The
> inlined version does not apply cleanly due to linewrapping and
> whitespace changes and, although I can re-type it, I would prefer to test
> the exact same thing that would be merged.

Sure, I have attached the "git format-patch" format of the commit. It
does seem to apply cleanly to the v4.9.144 kernel you mentioned you
are using.

Thanks for testing this!

best,
neal

[-- Attachment #2: 0001-tcp-fix-to-update-snd_wl1-in-bulk-receiver-fast-path.patch --]
[-- Type: application/x-patch, Size: 1363 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-15 22:37       ` Neal Cardwell
@ 2020-10-16  7:35         ` Eric Dumazet
  2020-10-16 16:57         ` Apollon Oikonomopoulos
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2020-10-16  7:35 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Apollon Oikonomopoulos, Yuchung Cheng, Netdev, Soheil Hassas Yeganeh

On Fri, Oct 16, 2020 at 12:37 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Thu, Oct 15, 2020 at 6:12 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
> >
> > Yuchung Cheng <ycheng@google.com> writes:
> >
> > > On Thu, Oct 15, 2020 at 1:22 PM Neal Cardwell <ncardwell@google.com> wrote:
> > >>
> > >> On Thu, Oct 15, 2020 at 2:31 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > I'm trying to debug a (possible) TCP issue we have been encountering
> > >> > sporadically during the past couple of years. Currently we're running
> > >> > 4.9.144, but we've been observing this since at least 3.16.
> > >> >
> > >> > Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly
> > >> > updated, leading to inability to recover from a TCP persist state and
> > >> > would appreciate some help debugging this.
> > >>
> > >> Thanks for the detailed report and diagnosis. I think we may need a
> > >> fix something like the following patch below.
> >
> > That was fast, thank you!
> >
> > >>
> > >> Eric/Yuchung/Soheil, what do you think?
> > > wow hard to believe how old this bug can be. The patch looks good but
> > > can Apollon verify this patch fix the issue?
> >
> > Sure, I can give it a try and let the systems do their thing for a couple of
> > days, which should be enough to see if it's fixed.
>
> Great, thanks!
>
> > Neal, would it be possible to re-send the patch as an attachment? The
> > inlined version does not apply cleanly due to linewrapping and
> > whitespace changes and, although I can re-type it, I would prefer to test
> > the exact same thing that would be merged.
>
> Sure, I have attached the "git format-patch" format of the commit. It
> does seem to apply cleanly to the v4.9.144 kernel you mentioned you
> are using.
>
> Thanks for testing this!
>
> best,
> neal

Ouch, this is an interesting bug.  Would netperf -t TCP_RR -- -r
2GB,2GB   " be a possible test ?
(I am afraid packetdrill won't be able to test this in a reasonable
amount of time)

Neal, can you include in your changelog the link to Apollon awesome
email, I think it was a very nice
investigation and Apollon deserves more credit than a mere "Reported-by:" tag ;)

Maybe this one :
Link: https://www.spinics.net/lists/netdev/msg692430.html


Thanks !

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-15 22:37       ` Neal Cardwell
  2020-10-16  7:35         ` Eric Dumazet
@ 2020-10-16 16:57         ` Apollon Oikonomopoulos
  2020-10-16 17:54           ` Neal Cardwell
  2020-10-22 12:47           ` Apollon Oikonomopoulos
  1 sibling, 2 replies; 11+ messages in thread
From: Apollon Oikonomopoulos @ 2020-10-16 16:57 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Yuchung Cheng, Netdev, Eric Dumazet, Soheil Hassas Yeganeh

Neal Cardwell <ncardwell@google.com> writes:
> On Thu, Oct 15, 2020 at 6:12 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
>> Neal, would it be possible to re-send the patch as an attachment? The
>> inlined version does not apply cleanly due to linewrapping and
>> whitespace changes and, although I can re-type it, I would prefer to test
>> the exact same thing that would be merged.
>
> Sure, I have attached the "git format-patch" format of the commit. It
> does seem to apply cleanly to the v4.9.144 kernel you mentioned you
> are using.
>
> Thanks for testing this!

We are now running the patched kernel on the machines involved. I want
to give it some time just to be sure, so I'll get back to you by
Thursday if everything goes well.

Best,
Apollon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-16 16:57         ` Apollon Oikonomopoulos
@ 2020-10-16 17:54           ` Neal Cardwell
  2020-10-22 12:47           ` Apollon Oikonomopoulos
  1 sibling, 0 replies; 11+ messages in thread
From: Neal Cardwell @ 2020-10-16 17:54 UTC (permalink / raw)
  To: Apollon Oikonomopoulos
  Cc: Yuchung Cheng, Netdev, Eric Dumazet, Soheil Hassas Yeganeh

On Fri, Oct 16, 2020 at 12:57 PM Apollon Oikonomopoulos
<apoikos@dmesg.gr> wrote:
>
> Neal Cardwell <ncardwell@google.com> writes:
> > On Thu, Oct 15, 2020 at 6:12 PM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
> >> Neal, would it be possible to re-send the patch as an attachment? The
> >> inlined version does not apply cleanly due to linewrapping and
> >> whitespace changes and, although I can re-type it, I would prefer to test
> >> the exact same thing that would be merged.
> >
> > Sure, I have attached the "git format-patch" format of the commit. It
> > does seem to apply cleanly to the v4.9.144 kernel you mentioned you
> > are using.
> >
> > Thanks for testing this!
>
> We are now running the patched kernel on the machines involved. I want
> to give it some time just to be sure, so I'll get back to you by
> Thursday if everything goes well.

Great. Thanks, Apollon, for testing this!

Thanks, Eric. I will include the link you suggested in the next rev of
the patch.

thanks,
neal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-16 16:57         ` Apollon Oikonomopoulos
  2020-10-16 17:54           ` Neal Cardwell
@ 2020-10-22 12:47           ` Apollon Oikonomopoulos
  2020-10-22 14:39             ` Neal Cardwell
  1 sibling, 1 reply; 11+ messages in thread
From: Apollon Oikonomopoulos @ 2020-10-22 12:47 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Yuchung Cheng, Netdev, Eric Dumazet, Soheil Hassas Yeganeh

Apollon Oikonomopoulos <apoikos@dmesg.gr> writes:
> We are now running the patched kernel on the machines involved. I want
> to give it some time just to be sure, so I'll get back to you by
> Thursday if everything goes well.

It has been almost a week and we have had zero hangs in 60 rsync runs,
so I guess we can call it fixed. At the same time we didn't notice any
ill side-effects. In the unlikely event it hangs again, I will let you
know.

I spent quite some time pondering this issue and to be honest it
troubles me that it seems to have been there for far too long for nobody
else to have noticed. The only reasonable explanation I can come up with
is that (please comment/correct me if I'm wrong):

 1. It will not be triggered by most L7 protocols. In "synchronous"
    request-response protocols such as HTTP, usually each side will
    consume all available data before sending. In this case, even if
    snd_wl1 wraps around, the bulk receiver is left with a non-zero
    window and is still able to send out data, causing the next
    acknowledgment to update the window and adjust snd_wl1. Also I
    cannot think of any asynchronous protocol apart from rsync where the
    server sends out multi-GB responses without checking for incoming
    data in the process.

 2. Regardless of the application protocol, the receiver must remain
    long enough (for at least 2GB) with a zero send window in the fast
    path to cause a wraparound — but not too long for after(ack_seq,
    snd_wl1) to be true again. In practice this means that header
    prediction should not fail (not even once!) and we should never run
    out of receive space, as these conditions would send us to the slow
    path and call tcp_ack(). I'd argue this is likely to happen only
    with stable, long-running, low- or moderately-paced TCP connections
    in local networks where packet loss is minimal (although most of the
    time things move around as fast as they can in a local network). At
    this point I wonder if the userspace rate-limiting we enabled on
    rsync actually did more harm…

Finally, even if someone hits this, any application caring about network
timeouts will either fail or reconnect, making it appear as a "random
network glitch" and leaving no traces to debug behind. And in the
unlikely event that your application lingers forever in the persist
state, it certainly takes a fair amount of annoyance to sidestep your
ignorance, decide that this might indeed be a kernel bug, and go after
it :)

Thanks again for the fast response!

Best,
Apollon

P.S: I wonder if it would make sense to expose snd_una and snd_wl1
     in struct tcp_info to ease debugging.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP sender stuck in persist despite peer advertising non-zero window
  2020-10-22 12:47           ` Apollon Oikonomopoulos
@ 2020-10-22 14:39             ` Neal Cardwell
  0 siblings, 0 replies; 11+ messages in thread
From: Neal Cardwell @ 2020-10-22 14:39 UTC (permalink / raw)
  To: Apollon Oikonomopoulos
  Cc: Yuchung Cheng, Netdev, Eric Dumazet, Soheil Hassas Yeganeh

On Thu, Oct 22, 2020 at 8:47 AM Apollon Oikonomopoulos <apoikos@dmesg.gr> wrote:
>
> Apollon Oikonomopoulos <apoikos@dmesg.gr> writes:
> > We are now running the patched kernel on the machines involved. I want
> > to give it some time just to be sure, so I'll get back to you by
> > Thursday if everything goes well.
>
> It has been almost a week and we have had zero hangs in 60 rsync runs,
> so I guess we can call it fixed. At the same time we didn't notice any
> ill side-effects. In the unlikely event it hangs again, I will let you
> know.

Great. Many thanks for your testing and thorough analysis!

I agree that it's a little surprising that this bug would be there for
so long (looks like since Linux v2.1.8 in Nov 1996), but I also agree
with your analysis about why this might be so.

I have posted the proposed patch here:

  https://patchwork.ozlabs.org/project/netdev/patch/20201022143331.1887495-1-ncardwell.kernel@gmail.com/

best,
neal

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-22 14:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 18:23 TCP sender stuck in persist despite peer advertising non-zero window Apollon Oikonomopoulos
2020-10-15 20:22 ` Neal Cardwell
2020-10-15 20:27   ` Soheil Hassas Yeganeh
2020-10-15 21:39   ` Yuchung Cheng
2020-10-15 22:12     ` Apollon Oikonomopoulos
2020-10-15 22:37       ` Neal Cardwell
2020-10-16  7:35         ` Eric Dumazet
2020-10-16 16:57         ` Apollon Oikonomopoulos
2020-10-16 17:54           ` Neal Cardwell
2020-10-22 12:47           ` Apollon Oikonomopoulos
2020-10-22 14:39             ` Neal Cardwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).