From owner-tcp-impl@lerc.nasa.gov  Fri Jun  2 13:30:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA07934
	for <tcpimpl-archive@odin.ietf.org>; Fri, 2 Jun 2000 13:30:29 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id KAA21538
	for tcp-impl-outgoing; Fri, 2 Jun 2000 10:26:32 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id KAA21505
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 10:26:29 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id KAA15214; Fri, 2 Jun 2000 10:26:27 -0400 (EDT)
Received: from ga.prestige.net(208.220.88.4) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma015138; Fri, 2 Jun 00 10:25:55 -0400
Received: from erols.com [63.79.239.202] by prestige.net with ESMTP
  (SMTPD32-6.00) id A3B44390092; Fri, 02 Jun 2000 10:24:52 -0400
Message-ID: <3937C3E9.3D37D011@erols.com>
Date: Fri, 02 Jun 2000 10:25:45 -0400
From: Frank <fdufour@erols.com>
Reply-To: fdufour@erols.com
Organization: The DuFour Family
X-Mailer: Mozilla 4.73 [en] (Win98; I)
X-Accept-Language: en
MIME-Version: 1.0
To: "Jens-S. Voeckler" <voeckler@rvs.uni-hannover.de>
CC: Hyoung-Kee Choi <hkchoi@cc.gatech.edu>, tcp-impl@grc.nasa.gov
Subject: Re: Snoop maxcount
References: <Pine.LNX.4.21.0005310956480.457-100000@animal.rvs.uni-hannover.de>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 8bit

I would think that the operating system limits would be your problem rather
than snoop.  I've forgotten the command, but check to see what the
maxfilesize limit is on your system.

Paula Dufour
George Washington University

"Jens-S. Voeckler" wrote:

> On Tue, 30 May 2000, Hyoung-Kee Choi wrote:
>
> ]I am having a problem with Snoop. Snoop abrubtly finished after
> ]recording 4 million packets although I did not specify the -c option in
> ]the command line. The trace file size is about 510MB after the Snoop
> ]run.
>
> I am restarting my snoop every night...
>
> ]In addition, has anyone sucessfully recorded packets in one file more
> ]than 2GB?
>
> Using the "largefiles" mount option might be the key here, see mount_ufs.
>
> Le deagh dhùrachd,
> Dipl.-Ing. Jens-S. Vöckler (voeckler@rvs.uni-hannover.de)
> Institute for Computer Networks and Distributed Systems
> University of Hanover, Germany; +49 511 762 4726



From owner-tcp-impl@lerc.nasa.gov  Fri Jun  2 20:33:39 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA17688
	for <tcpimpl-archive@odin.ietf.org>; Fri, 2 Jun 2000 20:33:38 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id RAA22236
	for tcp-impl-outgoing; Fri, 2 Jun 2000 17:56:02 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id RAA22211
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 17:56:00 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id RAA24234; Fri, 2 Jun 2000 17:55:59 -0400 (EDT)
Received: from holmes.proxinet.com(166.90.59.6) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma024215; Fri, 2 Jun 00 17:55:56 -0400
Received: from pumatech.com (nassau.proxinet.com [166.90.59.24])
	by holmes.proxinet.com (8.9.3/8.9.3) with ESMTP id OAA25074
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 14:55:49 -0700
Message-ID: <39382ECA.555E23E6@pumatech.com>
Date: Fri, 02 Jun 2000 15:01:46 -0700
From: Wu-chang Feng <wfeng@pumatech.com>
Reply-To: wfeng@pumatech.com
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.5-15 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: TCP Implementors <tcp-impl@grc.nasa.gov>
Subject: Hung Solaris TCP
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

Has anyone seen problems with TCP hanging on Solaris when accessing a
web site over a lossy link?  I've tried hitting pages from
www.viajeya.com using Solaris 2.6 and Solaris 2.7 (11/99 version). 
When packets arrive ordered correctly, the request immediately
returns.  When the last couple of packets come back out of order, the
client TCP (in this case either a manual telnet or a squid cache
process) sits around indefinitely.....Here's a tcpdump trace of two
identical requests: one fails and the other is OK.

Wu

Bad request
-----------
14:45:12.511537 panamint.35323 > 200.10.104.13.80: S
1806393459:1806393459(0) win 8760 <mss 1460> (DF)
14:45:13.259467 200.10.104.13.80 > panamint.35323: S
2085229208:2085229208(0) ack 1806393460 win 8760 <mss 1460> (DF)
14:45:13.259495 panamint.35323 > 200.10.104.13.80: . ack 1 win 8760
(DF)
14:45:14.198208 panamint.35323 > 200.10.104.13.80: P 1:42(41) ack 1
win 8760 (DF)
14:45:15.144267 200.10.104.13.80 > panamint.35323: . ack 42 win 8719
(DF)
14:45:15.144291 panamint.35323 > 200.10.104.13.80: P 42:44(2) ack 1
win 8760 (DF)
14:45:15.966015 200.10.104.13.80 > panamint.35323: P 1660:1665(5) ack
44 win 8717 (DF)
14:45:15.966058 panamint.35323 > 200.10.104.13.80: P 44:46(2) ack 1
win 8760 (DF)
14:45:15.966069 panamint.35323 > 200.10.104.13.80: . ack 1 win 8760
(DF)
14:45:15.970164 200.10.104.13.80 > panamint.35323: P 1:200(199) ack 44
win 8717 (DF)
14:45:15.970206 panamint.35323 > 200.10.104.13.80: . ack 200 win 8760
(DF)
14:45:16.163957 200.10.104.13.80 > panamint.35323: P 200:1660(1460)
ack 44 win 8717 (DF)
14:45:16.163984 panamint.35323 > 200.10.104.13.80: . ack 1665 win 8760
(DF)
14:45:16.973773 200.10.104.13.80 > panamint.35323: P 3125:4585(1460)
ack 46 win 8715 (DF)
14:45:16.973804 panamint.35323 > 200.10.104.13.80: . ack 1665 win 8760
(DF)
14:45:16.986916 200.10.104.13.80 > panamint.35323: P 1665:3125(1460)
ack 46 win 8715 (DF)
14:45:16.986940 panamint.35323 > 200.10.104.13.80: . ack 4585 win 8760
(DF)
14:45:17.184131 200.10.104.13.80 > panamint.35323: P 6045:7505(1460)
ack 46 win 8715 (DF)
14:45:17.184154 panamint.35323 > 200.10.104.13.80: . ack 4585 win 8760
(DF)
14:45:17.327897 200.10.104.13.80 > panamint.35323: P 4585:6045(1460)
ack 46 win 8715 (DF)
14:45:17.327925 panamint.35323 > 200.10.104.13.80: . ack 7505 win 8760
(DF)
14:45:17.805154 200.10.104.13.80 > panamint.35323: FP 8965:9198(233)
ack 46 win 8715 (DF)
14:45:17.805187 panamint.35323 > 200.10.104.13.80: . ack 7505 win 8760
(DF)
14:45:18.009141 200.10.104.13.80 > panamint.35323: P 7505:8965(1460)
ack 46 win 8715 (DF)
14:45:18.009166 panamint.35323 > 200.10.104.13.80: . ack 9198 win 8760
(DF)
------->Connection killed manually after 5 minutes.
14:50:09.086690 panamint.35323 > 200.10.104.13.80: P 46:51(5) ack 9198
win 8760 (DF)
14:50:09.638441 200.10.104.13.80 > panamint.35323: R
2085238407:2085238407(0) win 0 (DF)


Good request:
-------------
14:43:30.073361 panamint.35321 > 200.10.104.13.80: S
1793430152:1793430152(0) win 8760 <mss 1460> (DF)
14:43:30.578465 200.10.104.13.80 > panamint.35321: S
2085100976:2085100976(0) ack 1793430153 win 8760 <mss 1460> (DF)
14:43:30.578489 panamint.35321 > 200.10.104.13.80: . ack 1 win 8760
(DF)
14:43:37.043337 panamint.35321 > 200.10.104.13.80: P 1:42(41) ack 1
win 8760 (DF)
14:43:37.696615 200.10.104.13.80 > panamint.35321: . ack 42 win 8719
(DF)
14:43:37.696650 panamint.35321 > 200.10.104.13.80: P 42:44(2) ack 1
win 8760 (DF)
14:43:38.198522 200.10.104.13.80 > panamint.35321: P 1:200(199) ack 44
win 8717 (DF)
14:43:38.198673 panamint.35321 > 200.10.104.13.80: . ack 200 win 8760
(DF)
14:43:38.371998 200.10.104.13.80 > panamint.35321: P 200:1660(1460)
ack 44 win 8717 (DF)
14:43:38.372097 panamint.35321 > 200.10.104.13.80: . ack 1660 win 8760
(DF)
14:43:38.372325 200.10.104.13.80 > panamint.35321: P 1660:1665(5) ack
44 win 8717 (DF)
14:43:38.420311 panamint.35321 > 200.10.104.13.80: . ack 1665 win 8760
(DF)
14:43:38.787564 200.10.104.13.80 > panamint.35321: P 1665:3125(1460)
ack 44 win 8717 (DF)
14:43:38.830446 panamint.35321 > 200.10.104.13.80: . ack 3125 win 8760
(DF)
14:43:38.994654 200.10.104.13.80 > panamint.35321: P 3125:4585(1460)
ack 44 win 8717 (DF)
14:43:39.040430 panamint.35321 > 200.10.104.13.80: . ack 4585 win 8760
(DF)
14:43:39.048770 200.10.104.13.80 > panamint.35321: P 4585:6045(1460)
ack 44 win 8717 (DF) 
14:43:39.048834 200.10.104.13.80 > panamint.35321: . 8960:8965(5) ack
44 win 8717 (DF)
14:43:39.048859 panamint.35321 > 200.10.104.13.80: . ack 6045 win 8760
(DF)
14:43:39.109748 200.10.104.13.80 > panamint.35321: P 6045:7500(1455)
ack 44 win 8717 (DF)
14:43:39.109773 panamint.35321 > 200.10.104.13.80: . ack 7500 win 8760
(DF)
14:43:39.133228 200.10.104.13.80 > panamint.35321: P 7500:8960(1460)
ack 44 win 8717 (DF)
14:43:39.133255 panamint.35321 > 200.10.104.13.80: . ack 8965 win 8760
(DF)
14:43:39.340590 200.10.104.13.80 > panamint.35321: FP 8965:9198(233)
ack 44 win 8717 (DF)
14:43:39.340702 panamint.35321 > 200.10.104.13.80: . ack 9199 win 8760
(DF)
14:43:39.341280 panamint.35321 > 200.10.104.13.80: F 44:44(0) ack 9199
win 8760 (DF)
14:43:39.829074 200.10.104.13.80 > panamint.35321: . ack 45 win 8717
(DF)


From owner-tcp-impl@lerc.nasa.gov  Fri Jun  2 22:34:49 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA20602
	for <tcpimpl-archive@odin.ietf.org>; Fri, 2 Jun 2000 22:34:48 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id UAA28531
	for tcp-impl-outgoing; Fri, 2 Jun 2000 20:02:50 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id UAA28517
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 20:02:48 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id UAA08161; Fri, 2 Jun 2000 20:02:47 -0400 (EDT)
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma008146; Fri, 2 Jun 00 20:02:36 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id RAA04606
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 17:02:35 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.82.166])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id RAA15481
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 17:02:35 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.1+Sun/8.10.1) with SMTP id e5302YB561192
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 17:02:34 -0700 (PDT)
Date: Fri, 2 Jun 2000 17:02:33 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Hung Solaris TCP
To: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <39382ECA.555E23E6@pumatech.com>
Message-ID: <Roam.SIMC.2.0.6.959990553.176.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Has anyone seen problems with TCP hanging on Solaris when accessing a
> web site over a lossy link?  I've tried hitting pages from
> www.viajeya.com using Solaris 2.6 and Solaris 2.7 (11/99 version). 
> When packets arrive ordered correctly, the request immediately
> returns.  When the last couple of packets come back out of order, the
> client TCP (in this case either a manual telnet or a squid cache
> process) sits around indefinitely.....Here's a tcpdump trace of two
> identical requests: one fails and the other is OK.

Do you know what platform www.viajeya.com is?

> 14:45:17.805154 200.10.104.13.80 > panamint.35323: FP 8965:9198(233)
> ack 46 win 8715 (DF)
> 14:45:17.805187 panamint.35323 > 200.10.104.13.80: . ack 7505 win 8760
> (DF)
> 14:45:18.009141 200.10.104.13.80 > panamint.35323: P 7505:8965(1460)
> ack 46 win 8715 (DF)
> 14:45:18.009166 panamint.35323 > 200.10.104.13.80: . ack 9198 win 8760
> (DF)

panamint ack's up to 9198.  The version of Solaris you have does not store
out of order FIN.  You can see that this ack does not ack the FIN, otherwise
the ack would be 9199.  But the other side does not retransmit the FIN, and
it should because FIN has not been ack'ed.  This causes the hang.  This is
a bug in www.viajeya.com.

To get around this bug in www.viajeya.com, refer to Sun RFE 4330074.  I think
the latest 2.6 patch should have this enhancement.  The change is to make
Solaris' TCP record the out of order FIN info.

							K. Poon.
							kcpoon@eng.sun.com





From owner-tcp-impl@lerc.nasa.gov  Fri Jun  2 22:34:52 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA20629
	for <tcpimpl-archive@odin.ietf.org>; Fri, 2 Jun 2000 22:34:51 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id UAA28458
	for tcp-impl-outgoing; Fri, 2 Jun 2000 20:01:19 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id UAA28447
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 20:01:17 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id UAA08013; Fri, 2 Jun 2000 20:01:16 -0400 (EDT)
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma007994; Fri, 2 Jun 00 20:01:10 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id RAA04196;
	Fri, 2 Jun 2000 17:00:48 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.82.166])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id RAA15098;
	Fri, 2 Jun 2000 17:00:48 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.1+Sun/8.10.1) with SMTP id e5300lB560938;
	Fri, 2 Jun 2000 17:00:47 -0700 (PDT)
Date: Fri, 2 Jun 2000 17:00:46 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Snoop maxcount
To: Hyoung-Kee Choi <hkchoi@cc.gatech.edu>
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <393413D2.F332EA5C@cc.gatech.edu>
Message-ID: <Roam.SIMC.2.0.6.959990446.9433.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> I am having a problem with Snoop. Snoop abrubtly finished after
> recording 4 million packets although I did not specify the -c option in
> the command line. The trace file size is about 510MB after the Snoop
> run.

Do you get an error about out of disk space?  Snoop itself does not impose a
file size limit.  It just calculates the available space (fs.f_bavail *
fs.f_frsize) in the filesystem by doing fstatvfs().  So depending on how you
setup your filesystem, you may or may not be able to get a 2G trace file.

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun  2 22:47:58 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA20951
	for <tcpimpl-archive@odin.ietf.org>; Fri, 2 Jun 2000 22:47:57 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id UAA00097
	for tcp-impl-outgoing; Fri, 2 Jun 2000 20:38:04 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id UAA00082
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 20:38:02 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id UAA11910; Fri, 2 Jun 2000 20:38:01 -0400 (EDT)
Received: from aland.bbn.com(204.162.9.10) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma011827; Fri, 2 Jun 00 20:37:16 -0400
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id UAA03204;
	Fri, 2 Jun 2000 20:37:14 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Message-Id: <200006030037.UAA03204@aland.bbn.com>
To: wfeng@pumatech.com
cc: TCP Implementors <tcp-impl@grc.nasa.gov>
Subject: Re: Hung Solaris TCP 
In-reply-to: Your message of "Fri, 02 Jun 2000 15:01:46 PDT."
             <39382ECA.555E23E6@pumatech.com> 
Date: Fri, 02 Jun 2000 20:37:13 -0400
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


That's an interesting bug.

If I'm reading this right, what happens is the remote server sends a FIN
to panamint, which panamint *fails* to ACK (it acks up to but not
including the FIN) if the FIN arrives out of order.  It looks like
panamint doesn't save the FIN when it saves the out of order data.

The interesting question is why the server doesn't retransmit the FIN but
instead falls into a state where it sends a RESET when panamint tries to
send later...  That is odd, as FIN-WAIT-1 is supposedly a stable
state.

Craig


From owner-tcp-impl@lerc.nasa.gov  Sat Jun  3 01:42:57 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id BAA26805
	for <tcpimpl-archive@odin.ietf.org>; Sat, 3 Jun 2000 01:42:57 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id XAA05692
	for tcp-impl-outgoing; Fri, 2 Jun 2000 23:08:49 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id XAA05680
	for <tcp-impl@grc.nasa.gov>; Fri, 2 Jun 2000 23:08:47 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id XAA26405; Fri, 2 Jun 2000 23:08:46 -0400 (EDT)
Received: from pizda.ninka.net(216.101.162.242) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma026392; Fri, 2 Jun 00 23:08:44 -0400
Received: (from davem@localhost)
	by pizda.ninka.net (8.9.3/8.9.3) id TAA21127;
	Fri, 2 Jun 2000 19:59:24 -0700
Date: Fri, 2 Jun 2000 19:59:24 -0700
Message-Id: <200006030259.TAA21127@pizda.ninka.net>
X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f
From: "David S. Miller" <davem@redhat.com>
To: craig@aland.bbn.com
CC: wfeng@pumatech.com, tcp-impl@grc.nasa.gov
In-reply-to: <200006030037.UAA03204@aland.bbn.com> (message from Craig
	Partridge on Fri, 02 Jun 2000 20:37:13 -0400)
Subject: Re: Hung Solaris TCP
References:  <200006030037.UAA03204@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

   Date: Fri, 02 Jun 2000 20:37:13 -0400
   From: Craig Partridge <craig@aland.bbn.com>

   If I'm reading this right, what happens is the remote server sends
   a FIN to panamint, which panamint *fails* to ACK (it acks up to but
   not including the FIN) if the FIN arrives out of order.  It looks
   like panamint doesn't save the FIN when it saves the out of order
   data.

   The interesting question is why the server doesn't retransmit the
   FIN but instead falls into a state where it sends a RESET when
   panamint tries to send later...  That is odd, as FIN-WAIT-1 is
   supposedly a stable state.

This reminds me, we once had a bug under Linux talking to older
Solaris stacks in that if we kept resending the FIN with data
attached, Solaris would not ACK it.  The fix was to drop the data
bytes from the final FIN packet once they were acked to work around
this Solaris TCP bug.

This is probably the same bug being seen here.

Later,
David S. Miller
davem@redhat.com


From owner-tcp-impl@lerc.nasa.gov  Mon Jun  5 07:58:26 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id HAA29407
	for <tcpimpl-archive@odin.ietf.org>; Mon, 5 Jun 2000 07:58:26 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id FAA11933
	for tcp-impl-outgoing; Mon, 5 Jun 2000 05:00:45 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id FAA11924
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 05:00:43 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id FAA00686; Mon, 5 Jun 2000 05:00:43 -0400 (EDT)
Received: from keskus.tct.hut.fi(130.233.154.176) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma000669; Mon, 5 Jun 00 05:00:28 -0400
Received: from ws18.tct.hut.fi (ws18.tct.hut.fi [130.233.154.145])
	by keskus.tct.hut.fi (8.10.0/8.10.0) with ESMTP id e5591KM11687
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 12:01:20 +0300 (EET DST)
Received: (from puhuri@localhost)
	by ws18.tct.hut.fi (8.8.8+Sun/8.8.8) id LAA04448;
	Mon, 5 Jun 2000 11:57:32 +0300 (EET DST)
To: tcp-impl@grc.nasa.gov
Subject: Re: Hung Solaris TCP
References: <Roam.SIMC.2.0.6.959990553.176.kcpoon@jurassic>
From: Markus Peuhkuri <puhuri@ws18.tct.hut.fi>
Date: 05 Jun 2000 11:57:32 +0300
In-Reply-To: Kacheong Poon's message of "Fri, 2 Jun 2000 17:02:33 -0700 (PDT)"
Message-ID: <mcud7lwijyb.fsf@ws18.tct.hut.fi>
Lines: 9
X-Mailer: Gnus v5.7/Emacs 20.4
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Kacheong Poon <Kacheong.Poon@Eng.Sun.COM> writes:

> Do you know what platform www.viajeya.com is?

	Nmap V 2.12 <URL:http://www.insecure.org/nmap/> answers 
	"Remote operating system guess: Windows NT4 / Win95 / Win98"

-- 
Markus Peuhkuri        ! http://www.iki.fi/puhuri/


From owner-tcp-impl@lerc.nasa.gov  Mon Jun  5 21:35:33 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA13369
	for <tcpimpl-archive@odin.ietf.org>; Mon, 5 Jun 2000 21:35:32 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA08038
	for tcp-impl-outgoing; Mon, 5 Jun 2000 19:21:48 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id TAA08014
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 19:21:45 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id TAA04439; Mon, 5 Jun 2000 19:21:44 -0400 (EDT)
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma004404; Mon, 5 Jun 00 19:21:16 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id QAA27676
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 16:21:14 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.82.166])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id QAA15146
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 16:21:15 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.1+Sun/8.10.1) with SMTP id e55NL9u125597
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 16:21:12 -0700 (PDT)
Date: Mon, 5 Jun 2000 16:21:08 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Hung Solaris TCP
To: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <200006030259.TAA21127@pizda.ninka.net>
Message-ID: <Roam.SIMC.2.0.6.960247268.31855.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> This reminds me, we once had a bug under Linux talking to older
> Solaris stacks in that if we kept resending the FIN with data
> attached, Solaris would not ACK it.  The fix was to drop the data
> bytes from the final FIN packet once they were acked to work around
> this Solaris TCP bug.
> 
> This is probably the same bug being seen here.

No, this is not the same.  The problem in discussion is that the other side
does not retransmit the FIN, not failing to accept a FIN.  We have experience
with some version of NT server which does exactly this.  It seems that even
without the FIN acknowledged, the NT server "thinks" that the FIN is ack'ed,
thus closing the connection.

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Mon Jun  5 22:39:27 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA15522
	for <tcpimpl-archive@odin.ietf.org>; Mon, 5 Jun 2000 22:39:27 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id UAA13586
	for tcp-impl-outgoing; Mon, 5 Jun 2000 20:30:52 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id UAA13548
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 20:30:49 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id UAA12644; Mon, 5 Jun 2000 20:30:46 -0400 (EDT)
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma012541; Mon, 5 Jun 00 20:30:02 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id RAA19599
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 17:30:00 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.88.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id RAA27977
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 17:30:00 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.1+Sun/8.10.1) with SMTP id e560Txu136234
	for <tcp-impl@grc.nasa.gov>; Mon, 5 Jun 2000 17:29:59 -0700 (PDT)
Date: Mon, 5 Jun 2000 17:29:59 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Hung Solaris TCP
To: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <mcud7lwijyb.fsf@ws18.tct.hut.fi>
Message-ID: <Roam.SIMC.2.0.6.960251399.24037.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> 	Nmap V 2.12 <URL:http://www.insecure.org/nmap/> answers 
> 	"Remote operating system guess: Windows NT4 / Win95 / Win98"

Someone sent me the following bug report in NT.

http://support.microsoft.com/support/kb/articles/Q254/9/30.ASP?LN=EN-US&SD=gn&FR=0

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Thu Jun  8 14:20:55 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA13408
	for <tcpimpl-archive@odin.ietf.org>; Thu, 8 Jun 2000 14:20:54 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id LAA03812
	for tcp-impl-outgoing; Thu, 8 Jun 2000 11:21:49 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id LAA00310;
	Thu, 8 Jun 2000 11:01:59 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id LAA27646; Thu, 8 Jun 2000 11:01:59 -0400 (EDT)
Received: from marjan.fesb.hr(161.53.166.3) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma027419; Thu, 8 Jun 00 11:00:58 -0400
Received: from jurica (jurica.fesb.hr [161.53.166.43])
	by marjan.fesb.hr (8.9.3/8.9.3) with SMTP id OAA02727;
	Thu, 8 Jun 2000 14:39:19 +0200 (MET DST)
Message-ID: <049401bfd148$cac26050$2ba635a1@fesb.hr>
From: "SoftCOM Secretary" <softcom@fesb.hr>
To: "SoftCOM Mailing List" <softcom@fesb.hr>
Subject: SoftCOM 2000 Deadline Extension and Feature Topic CFP
Date: Thu, 8 Jun 2000 14:40:34 +0200
Organization: FESB, University of Split
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

Dear All,

Due to the number of requests for deadline extension the SoftCOM 2000
Organizing Committee has extended the paper submission deadline to June 19,
2000.


We use this opportunity to remind the potential contributors that the
deadline for special session "The New Millennium Telecommunications in the
Alps-Adria Countries" is approaching.

From the papers presented at SoftCOM 2000, a set of the most representative
papers (or their integrals) will be selected for publication in the August
2001 issue of the IEEE Communications Magazine.

Deadline for the Feature Topic:
Complete manuscript to be received by July 15, 2000
Notification of acceptance: July 31, 2000


The Feature Topic includes, but it is not limited to:

-historical background, developments,
-operation statistics and experiencies, user (universities, institutes,
  schools) opinions,
-the newest technologies and services,
-development of telelearning and videoconferencing,
-Web applications and information systems,
-security aspects,
-plans for future,toward the new millennium,
-the role of universities and institutes,
-the role of the national and regional operators and companies, industry
  and institutions,
-joint projects, joint institutes and laboratories between academic and
  operators and companies, industry and institutions,
-participation of academic institutions in education of professionals from
  industry and vice versa,

Participation in this FT is expected from institutions which cover
academic networking (such as CARNet in Croatia, Arnes in Slovenia), as
well as from universities, institutes and schools as users, particularly
in the Alps-Adria region. In addition, contributions from national and
regional institutions, companies, operators and industry who participate in
development of academic networking, promotion of education and R&D
investigations are particularly welcome.

More details can be found at http://www.fesb.hr/SoftCOM


Thank you for your interest in SoftCOM 2000.

SoftCOM 2000 Organizing Committee






From owner-tcp-impl@lerc.nasa.gov  Thu Jun  8 14:59:19 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA14211
	for <tcpimpl-archive@odin.ietf.org>; Thu, 8 Jun 2000 14:59:18 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA12429
	for tcp-impl-outgoing; Thu, 8 Jun 2000 12:17:10 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA12400
	for <tcp-impl@grc.nasa.gov>; Thu, 8 Jun 2000 12:17:08 -0400 (EDT)
From: zainprov@swbell.net
Received: by seraph3.lerc.nasa.gov; id MAA10847; Thu, 8 Jun 2000 12:17:05 -0400 (EDT)
Received: from mta5.rcsntx.swbell.net(151.164.30.29) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xmaa10787; Thu, 8 Jun 00 12:16:51 -0400
Received: from zainprov ([207.193.24.81]) by mta5.rcsntx.swbell.net
 (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9)
 with SMTP id <0FVU00D3IFA62U@mta5.rcsntx.swbell.net> for
 tcp-impl@grc.nasa.gov; Thu,  8 Jun 2000 11:06:15 -0500 (CDT)
Date: Thu, 08 Jun 2000 11:06:15 -0500 (CDT)
Date-warning: Date header was inserted by mta5.rcsntx.swbell.net
Subject: Shocking LOSE 10-100lbs. DESTINY
To: tcp-impl@grc.nasa.gov
Message-id: <0FVU00DO4FEE2U@mta5.rcsntx.swbell.net>
MIME-version: 1.0
Content-type: text/plain; charset=unknown-8bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


Hello From Destiny,

You will LOOSE 20-100 pounds easy!
Do to Such a high demand for Destiny, we are able
To Dramatically reduce our price for the entire System!
You will LOVE our incredible offer on this
Scientific Breakthrough in Weight Loss.
Now with a 105% Money Back Guarantee!   
LOOK! http://home.swbell.net/zainprov/destiny.htm



We hope things are going well for you.  Good luck, God Bless, and 
HAVE A GREAT DAY!



Either you are someone else subscribed to our list.  To be removed
Simply reply with a blank email.  

Thank you,

Sherry Wilson



From owner-tcp-impl@lerc.nasa.gov  Sat Jun 10 02:00:21 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id CAA00428
	for <tcpimpl-archive@odin.ietf.org>; Sat, 10 Jun 2000 02:00:21 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id XAA17534
	for tcp-impl-outgoing; Fri, 9 Jun 2000 23:13:39 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id XAA17511
	for <tcp-impl@grc.nasa.gov>; Fri, 9 Jun 2000 23:13:36 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id XAA09686; Fri, 9 Jun 2000 23:13:36 -0400 (EDT)
Received: from web2004.mail.yahoo.com(128.11.68.204) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma009627; Fri, 9 Jun 00 23:12:57 -0400
Received: (qmail 29185 invoked by uid 60001); 10 Jun 2000 03:12:56 -0000
Message-ID: <20000610031256.29184.qmail@web2004.mail.yahoo.com>
Received: from [208.185.176.210] by web2004.mail.yahoo.com; Fri, 09 Jun 2000 20:12:56 PDT
Date: Fri, 9 Jun 2000 20:12:56 -0700 (PDT)
From: sankar ramamoorthi <sanka2g@yahoo.com>
Subject: network device and tcp-flow packet ordering
To: tcp-impl@grc.nasa.gov
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hi,

Is there a reference in any rfc's which says
to the equivalent of 
'Thou shall not reorder packets in a flow' by a
network device like router or a switch.

I am trying to make a case that a parralel
network-device (a device with a lot of parallel
engines inside it) should make effort to ensure that
packets inside a tcp flow are not reordered - but am
losing the argument. Without any effot the device has
the potential to reorder packets inside tcp flows
because of the parallism.

Also what would be the effect of such reordering on
TCP timers? What is the effect on SACK?

Any input on this point is welcome.

Thanks,

-- sankar ramamoorthi --


__________________________________________________
Do You Yahoo!?
Yahoo! Photos -- now, 100 FREE prints!
http://photos.yahoo.com


From owner-tcp-impl@lerc.nasa.gov  Sat Jun 10 10:21:37 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA08783
	for <tcpimpl-archive@odin.ietf.org>; Sat, 10 Jun 2000 10:21:36 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id HAA21113
	for tcp-impl-outgoing; Sat, 10 Jun 2000 07:42:52 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id HAA21108
	for <tcp-impl@grc.nasa.gov>; Sat, 10 Jun 2000 07:42:51 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id HAA25820; Sat, 10 Jun 2000 07:42:51 -0400 (EDT)
Received: from lightning.swansea.uk.linux.org(194.168.151.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma025790; Sat, 10 Jun 00 07:42:05 -0400
Received: from alan by the-village.bc.nu with local (Exim 2.12 #1)
	id 130jbV-00085M-00; Sat, 10 Jun 2000 12:39:06 +0100
Subject: Re: network device and tcp-flow packet ordering
To: sanka2g@yahoo.com (sankar ramamoorthi)
Date: Sat, 10 Jun 2000 12:39:04 +0100 (BST)
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: <20000610031256.29184.qmail@web2004.mail.yahoo.com> from "sankar ramamoorthi" at Jun 09, 2000 08:12:56 PM
X-Mailer: ELM [version 2.5 PL1]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <E130jbV-00085M-00@the-village.bc.nu>
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

> Is there a reference in any rfc's which says
> to the equivalent of 
> 'Thou shall not reorder packets in a flow' by a
> network device like router or a switch.

Not that I am aware of. You can drop, duplicate or re-order frames at will

> I am trying to make a case that a parralel
> network-device (a device with a lot of parallel
> engines inside it) should make effort to ensure that
> packets inside a tcp flow are not reordered - but am

It should.

> losing the argument. Without any effot the device has
> the potential to reorder packets inside tcp flows
> because of the parallism.
> 
> Also what would be the effect of such reordering on
> TCP timers? What is the effect on SACK?

Throw it through a simulator. Thats what simulators are for. I think you
will it looks not too unreasonable until you get errors at which point it
will look a lot lot uglier than if the frames were almost always in order.

Also bench the host CPU utilisation at both ends - that will rise materially
if the frames are not ordered and thus following the fastpaths.

Alan



From owner-tcp-impl@lerc.nasa.gov  Sat Jun 10 10:40:26 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA08891
	for <tcpimpl-archive@odin.ietf.org>; Sat, 10 Jun 2000 10:40:26 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id IAA22252
	for tcp-impl-outgoing; Sat, 10 Jun 2000 08:24:09 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id IAA22247
	for <tcp-impl@grc.nasa.gov>; Sat, 10 Jun 2000 08:24:08 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id IAA29239; Sat, 10 Jun 2000 08:24:08 -0400 (EDT)
Received: from aland.bbn.com(204.162.9.10) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma029190; Sat, 10 Jun 00 08:23:28 -0400
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id IAA07080;
	Sat, 10 Jun 2000 08:23:26 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Message-Id: <200006101223.IAA07080@aland.bbn.com>
To: sankar ramamoorthi <sanka2g@yahoo.com>
cc: tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering 
In-reply-to: Your message of "Fri, 09 Jun 2000 20:12:56 PDT."
             <20000610031256.29184.qmail@web2004.mail.yahoo.com> 
Date: Sat, 10 Jun 2000 08:23:25 -0400
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


There's no RFC that says it.

However, Jon Bennett, Nick Shectman and I published a paper showing how
packet reordering really trashes TCP performance in IEEE/ACM Trans. on
Networking last year.

Craig

In message <20000610031256.29184.qmail@web2004.mail.yahoo.com>, sankar ramamoor
thi writes:

>Hi,
>
>Is there a reference in any rfc's which says
>to the equivalent of 
>'Thou shall not reorder packets in a flow' by a
>network device like router or a switch.
>
>I am trying to make a case that a parralel
>network-device (a device with a lot of parallel
>engines inside it) should make effort to ensure that
>packets inside a tcp flow are not reordered - but am
>losing the argument. Without any effot the device has
>the potential to reorder packets inside tcp flows
>because of the parallism.
>
>Also what would be the effect of such reordering on
>TCP timers? What is the effect on SACK?
>
>Any input on this point is welcome.
>
>Thanks,
>
>-- sankar ramamoorthi --
>
>
>__________________________________________________
>Do You Yahoo!?
>Yahoo! Photos -- now, 100 FREE prints!
>http://photos.yahoo.com


From owner-tcp-impl@lerc.nasa.gov  Sat Jun 10 14:42:22 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA10138
	for <tcpimpl-archive@odin.ietf.org>; Sat, 10 Jun 2000 14:42:21 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA29483
	for tcp-impl-outgoing; Sat, 10 Jun 2000 12:06:57 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA29467
	for <tcp-impl@grc.nasa.gov>; Sat, 10 Jun 2000 12:06:55 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id MAA19433; Sat, 10 Jun 2000 12:06:55 -0400 (EDT)
Received: from calcite.rhyolite.com(38.159.140.3) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma019322; Sat, 10 Jun 00 12:06:13 -0400
Received: (from vjs@localhost)
	by calcite.rhyolite.com (8.9.3/calcite) id KAA11405
	for tcp-impl@grc.nasa.gov  env-from <vjs>;
	Sat, 10 Jun 2000 10:06:11 -0600 (MDT)
Date: Sat, 10 Jun 2000 10:06:11 -0600 (MDT)
From: Vernon Schryver <vjs@calcite.rhyolite.com>
Message-Id: <200006101606.KAA11405@calcite.rhyolite.com>
To: tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> From: Craig Partridge <craig@aland.bbn.com>
>
> There's no RFC that says it.
>
> However, Jon Bennett, Nick Shectman and I published a paper showing how
> packet reordering really trashes TCP performance in IEEE/ACM Trans. on
> Networking last year.

I've a real life endorsement of that thought.  Before PPP MP appeared
(long sad tale of IETF politics suppressed), I shipped what I call
BF&I Multilink (brute force and ignorance) and what others call "round
robin" or "load sharing."  MP spends about 5 bytes per packet to maintain
packet order so that protocols that don't tolerate re-ordering will work
over bundles of PPP links.  BF&I Multilink simply sends IP packets to the
shortest queue or round-robin, and so re-orders some TCP segments.  The
results of that with BSD TCP code are signifant retransmissions due to
duplicate Acks, and a 10-20% reduction in throughput compared to the
obvious estimate or to MP.

I suspect there are similar real life reasons why Cisco tries so hard to
maintain associations between flows and individual routes when load
balancing amoung routes.


Vernon Schryver    vjs@rhyolite.com


From owner-tcp-impl@lerc.nasa.gov  Sat Jun 10 19:29:26 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA11280
	for <tcpimpl-archive@odin.ietf.org>; Sat, 10 Jun 2000 19:29:26 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id QAA08730
	for tcp-impl-outgoing; Sat, 10 Jun 2000 16:57:14 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id QAA08723
	for <tcp-impl@grc.nasa.gov>; Sat, 10 Jun 2000 16:57:13 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id QAA15651; Sat, 10 Jun 2000 16:57:12 -0400 (EDT)
Received: from aland.bbn.com(204.162.9.10) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma015554; Sat, 10 Jun 00 16:57:07 -0400
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id QAA07773;
	Sat, 10 Jun 2000 16:56:58 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Message-Id: <200006102056.QAA07773@aland.bbn.com>
To: Vernon Schryver <vjs@calcite.rhyolite.com>
cc: tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering 
In-reply-to: Your message of "Sat, 10 Jun 2000 10:06:11 MDT."
             <200006101606.KAA11405@calcite.rhyolite.com> 
Date: Sat, 10 Jun 2000 16:56:58 -0400
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


In message <200006101606.KAA11405@calcite.rhyolite.com>, Vernon Schryver writes
:

>I've a real life endorsement of that thought....

Just a small followup -- the paper I wrote with Bennett and Shectman
used real-life examples too.  We found a point in the Internet that
massively reordered under heavy load and caught a large number of
TCP traces that sufferred reordering.  So we could see what happened
when data was reordered, when acks were reordered, and when both
were reordered.  In IEEE/ACM Trans. on Networking, Vol 7, No 6 (Dec '99).

Thanks!

Craig


From owner-tcp-impl@lerc.nasa.gov  Sun Jun 11 00:56:47 2000
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA14826
	for <tcpimpl-archive@odin.ietf.org>; Sun, 11 Jun 2000 00:56:42 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id WAA17996
	for tcp-impl-outgoing; Sat, 10 Jun 2000 22:00:12 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id WAA17987
	for <tcp-impl@grc.nasa.gov>; Sat, 10 Jun 2000 22:00:11 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id WAA10969; Sat, 10 Jun 2000 22:00:11 -0400 (EDT)
Received: from prue.eim.surrey.ac.uk(131.227.76.5) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma010957; Sat, 10 Jun 00 22:00:03 -0400
Received: from petra.ee.surrey.ac.uk ([131.227.88.13] ident=eep1lw)
	by prue.eim.surrey.ac.uk with esmtp (Exim 3.03 #1)
	id 130x2C-0007Ll-00; Sun, 11 Jun 2000 02:59:32 +0100
Date: Sun, 11 Jun 2000 02:59:29 +0100 (BST)
From: Lloyd Wood <l.wood@eim.surrey.ac.uk>
X-Sender: eep1lw@petra.ee.surrey.ac.uk
Reply-To: L.Wood@eim.surrey.ac.uk
To: Craig Partridge <craig@aland.bbn.com>
cc: sankar ramamoorthi <sanka2g@yahoo.com>, tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering
In-Reply-To: <200006101223.IAA07080@aland.bbn.com>
Message-ID: <Pine.GSO.4.21.0006110117520.15333-100000@petra.ee.surrey.ac.uk>
Organization: speaking for none
X-url: http://www.ee.surrey.ac.uk/Personal/L.Wood/
X-no-archive: yes
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

On Sat, 10 Jun 2000, Craig Partridge wrote:

> There's no RFC that says it.

thank goodness. hurrah. etc.


> However, Jon Bennett, Nick Shectman and I published a paper showing how
> packet reordering really trashes TCP performance in IEEE/ACM Trans. on
> Networking last year.

imo this really indicates a problem with the state of the art of TCP's
congestion algorithms; TCP deserves to be trashed, and fast recovery
could be a lot more flexible here. (cf section IV C of that Dec. 1999
paper.)

relevant work I don't think has been mentioned in this thread yet:

http://www.ietf.org/internet-drafts/draft-allman-tcp-lossrec-00.txt

draft-allman-tcp-lossrec-00.txt [Enhancing TCP's Loss Recovery Using
Early Duplicate Acknowledgment Response, M. Allman, H. Balakrishnan,
S. Floyd]

This draft discusses selectively _lowering_ the fast recovery
threshold under certain specific low-window conditions (and cites the
paper Craig mentions above).

AFAIK there are as yet no concrete implementation proposals for TCP to
selectively _raise_ this threshold in the face of continual
misordered delivery.

(no doubt because that's a much less conservative and open-ended
 change that would require a lot more study of the dynamics under a
 wide range of conditions, and probably some experimental operational
 observation as well).

Raising the TCP sender's 3-dupack threshold to increase its tolerance
when experiencing packet/ack reordering won't restore the level of TCP
goodput to that of a perfectly ordered flow, which is the maximum
bound - but you can improve throughput and gradually approach that
bound as you raise the threshold. (I've shown this in simulation for
part of a submitted paper.)

Work done within the network reordering a flow of in-transit packets
is work wasted on not doing useful delivery.

imo any packet network should be _encouraged_ to gratuitously misorder
packets (especially if locally appropriate for delivery efficiency)
without worrying about the endeffects - it's the easiest way to keep
the equipment simple (cf the workconserving appendix to Bennett et
al's paper).

Having all the network work like hell to maintain the illusion that an
ordered circuit-like flow can be maintained all the time is a Crime
Against Nature. (Well, entropy.)

Put reordering complexity in the receiver where it belongs. Once.
Not in every implementation of every protocol in every switch.
Improve TCP, and build *simpler* fast packet networks.

L.

read can this understand you perfectly the point you if. Yoda as does.

<L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/>





From owner-tcp-impl@lerc.nasa.gov  Sun Jun 11 06:12:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA06492
	for <tcpimpl-archive@odin.ietf.org>; Sun, 11 Jun 2000 06:12:28 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id DAA28460
	for tcp-impl-outgoing; Sun, 11 Jun 2000 03:04:47 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id DAA28444
	for <tcp-impl@grc.nasa.gov>; Sun, 11 Jun 2000 03:04:42 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id DAA06169; Sun, 11 Jun 2000 03:04:42 -0400 (EDT)
Received: from unknown(209.31.7.46) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma006147; Sun, 11 Jun 00 03:04:16 -0400
Received: from Buffalo (buffalo.ehsco.com [209.31.7.44])
          by Arachnid.NTRG.com (Netscape Messaging Server 3.62)  with SMTP
          id 711 for <tcp-impl@grc.nasa.gov>;
          Sun, 11 Jun 2000 00:04:15 -0700
Message-ID: <000a01bfd373$4672ed60$2c071fd1@NTRG.com>
From: "Eric A. Hall" <ehall@ehsco.com>
To: <tcp-impl@grc.nasa.gov>
References: <Pine.GSO.4.21.0006110117520.15333-100000@petra.ee.surrey.ac.uk>
Subject: Re: network device and tcp-flow packet ordering
Date: Sun, 11 Jun 2000 00:04:24 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6600
X-Mimeole: Produced By Microsoft MimeOLE V5.00.2919.6600
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit


Lloyd Wood wrote:

> Put reordering complexity in the receiver where it belongs. Once.
> Not in every implementation of every protocol in every switch.
> Improve TCP, and build *simpler* fast packet networks.

I agree with you irt prev-gen gear, but in those cases the switch path was
linear. Out-of-order data is/was due to bad design of something rather than
a side effect of "normal" read-and-forward operation. In this case though it
seems to be a given that a parrallel switch will most likely cause a
significant amount of reordering to occur naturally which is an unplanned
phenomenon for TCP.

But it also probably depends on the end-to-end bandwidth, too. Reordering
probably won't be a problem on end-to-end connections that are only a
fraction of the bandwidth, since they won't get into the parrallel switch
fast enough to get out-of-ordered by the switch (unless the feeds are more
than a fraction, which is not the scenario). On pipes that can be filled
with a single flow or which can feed the switch at a significant rate
anyway, I bet it knocks performance way down on the effected end-points. In
this case, maybe ISPs won't care about reordering but researchers will care
very much. Then we'll just need faster all over again.

Such a product would likely end up becoming self-selective in its market.
Maybe technically the product can reorder data all it wants, but the market
won't want a product that does it.




From owner-tcp-impl@lerc.nasa.gov  Sun Jun 11 06:13:36 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA06503
	for <tcpimpl-archive@odin.ietf.org>; Sun, 11 Jun 2000 06:13:36 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id DAA28464
	for tcp-impl-outgoing; Sun, 11 Jun 2000 03:04:48 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id DAA28452
	for <tcp-impl@grc.nasa.gov>; Sun, 11 Jun 2000 03:04:43 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id DAA06167; Sun, 11 Jun 2000 03:04:42 -0400 (EDT)
Received: from unknown(209.31.7.46) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma006142; Sun, 11 Jun 00 03:04:08 -0400
Received: from Buffalo (buffalo.ehsco.com [209.31.7.44])
          by Arachnid.NTRG.com (Netscape Messaging Server 3.62)  with SMTP
          id 547 for <tcp-impl@grc.nasa.gov>;
          Sun, 11 Jun 2000 00:04:06 -0700
Message-ID: <000901bfd373$412c8640$2c071fd1@NTRG.com>
From: "Eric A. Hall" <ehall@ehsco.com>
To: <tcp-impl@grc.nasa.gov>
References: <Pine.GSO.4.21.0006110117520.15333-100000@petra.ee.surrey.ac.uk>
Subject: Re: network device and tcp-flow packet ordering
Date: Sun, 11 Jun 2000 00:04:15 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6600
X-Mimeole: Produced By Microsoft MimeOLE V5.00.2919.6600
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit


Lloyd Wood wrote:

> Put reordering complexity in the receiver where it belongs. Once.
> Not in every implementation of every protocol in every switch.
> Improve TCP, and build *simpler* fast packet networks.

I agree with you irt prev-gen gear, but in those cases the switch path was
linear. Out-of-order data is/was due to bad design of something rather than
a side effect of "normal" read-and-forward operation. In this case though it
seems to be a given that a parrallel switch will most likely cause a
significant amount of reordering to occur naturally which is an unplanned
phenomenon for TCP.

But it also probably depends on the end-to-end bandwidth, too. Reordering
probably won't be a problem on end-to-end connections that are only a
fraction of the bandwidth, since they won't get into the parrallel switch
fast enough to get out-of-ordered by the switch (unless the feeds are more
than a fraction, which is not the scenario). On pipes that can be filled
with a single flow or which can feed the switch at a significant rate
anyway, I bet it knocks performance way down on the effected end-points. In
this case, maybe ISPs won't care about reordering but researchers will care
very much. Then we'll just need faster all over again.

Such a product would likely end up becoming self-selective in its market.
Maybe technically the product can reorder data all it wants, but the market
won't want a product that does it.




From owner-tcp-impl@lerc.nasa.gov  Sun Jun 11 15:30:25 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA09057
	for <tcpimpl-archive@odin.ietf.org>; Sun, 11 Jun 2000 15:30:25 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA17422
	for tcp-impl-outgoing; Sun, 11 Jun 2000 12:37:01 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA17406
	for <tcp-impl@grc.nasa.gov>; Sun, 11 Jun 2000 12:37:00 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id MAA22506; Sun, 11 Jun 2000 12:36:59 -0400 (EDT)
Received: from elk.aciri.org(192.150.187.21) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma022479; Sun, 11 Jun 00 12:36:25 -0400
Received: from elk.aciri.org (localhost [127.0.0.1])
	by elk.aciri.org (8.9.3/8.9.3) with ESMTP id JAA55096;
	Sun, 11 Jun 2000 09:35:21 -0700 (PDT)
	(envelope-from floyd@elk.aciri.org)
Message-Id: <200006111635.JAA55096@elk.aciri.org>
To: sankar ramamoorthi <sanka2g@yahoo.com>
cc: tcp-impl@grc.nasa.gov
From: Sally Floyd <floyd@aciri.org>
Subject: Re: network device and tcp-flow packet ordering 
Date: Sun, 11 Jun 2000 09:35:21 -0700
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

I think it is fairly clear that currently, TCP gives abysmal
performance in the presence of significant reordering.  (When the
TCP receiver receives out-of-order packets, the TCP receiver sends
duplicate acknowledgements to tell the TCP sender.  The TCP sender
then does a Fast Retransmit, retransmitting the packet presumed to
be lost, and cutting the congestion window at least in half.  This
is true of Tahoe, Reno, NewReno, and SACK TCP, and, I presume, of
any TCP implementation more recent that 1988.)

I believe that the first step in making TCP more robust to reordering
is in the D-SACK (duplicate-SACK) extension to SACK, "An Extension
to the Selective Acknowledgement (SACK) Option for TCP",
"http://search.ietf.org/internet-drafts/draft-floyd-sack-00.txt".
This has already been approved by the IESG for Proposed Standard,
and is on the RFC editor's to-do queue.

I have a draft paper, "A Report on Some Recent Developments in TCP
Congestion Control", that discusses how the D-SACK option could be
used to make TCP more robust to reordering.  I am appending an
excerpt from that paper below.  As the excerpt makes clear, there
is a significant amount of work that would have to be done to take
the information in the D-SACK option and come out with viable,
tested algorithms that allow TCP to be robust to persistent
reordering...

- Sally
--------------------------------
http://www.aciri.org/floyd/
--------------------------------

From "A Report on Some Recent Developments in TCP
Congestion Control":

An initial step towards adding robustness in the presence of
unnecessary Retransmit Timeouts and Fast Retransmits is to give
the TCP sender the information to determine when an unnecessary
Retransmit Timeout or Fast Retransmit has occurred..  This first
step has been accomplished with the D-SACK (for duplicate-SACK)
extension \cite{FMMPR99} that has recently been added to the SACK
TCP option.  The D-SACK extension allows the TCP data receiver to
use the SACK option to report the receipt of duplicate segments.
With the use of D-SACK, the TCP sender can correctly infer the 
segments that have been received by the data receiver, including
duplicate segments. 

When the sender has retransmitted a packet, D-SACK does not allow
TCP to distinguish between the receipt at the receiver of both the
original and retransmitted packet, and the receipt of two copies  
of the retransmitted packet, one of which was duplicated in the
network.  If necessary, TCP's timestamp option could be used to
distinguish between these two cases \cite{AP99,L99}.  However, in
an environment with minimal packet replication in the network,
D-SACK allows the TCP sender to make reasonable inferences, one
round-trip time after a packet has been retransmitted, about whether
the retransmission was necessary or unnecessary.
    
If the TCP data sender determines, a round-trip time after
retransmitting a packet, that the receiver received two copies of
that segment and therefore that the packet retransmission was most
likely unnecessary, then the sender could have the option of
``undoing'' the halving in the congestion window.  The sender can
``undo'' the recent halving of the congestion window by increasing
the Slow-Start threshold ssthresh to the previous value of the old
congestion window, effectively slow-starting until the congestion
window has reached its old value.  In addition to restoring the
congestion window, the TCP sender could adjust the duplicate
acknowledgement threshold or the retransmit timeout parameters, to
avoid the wasted bandwidth of persistent unnecessary retransmits.

The first part of this work, providing the information to the sender
about duplicate packets received at the receiver, is done with the
D-SACK extension.  The next step is to evaluate specific mechanisms
for identifying an unnecessary halving of the congestion window,
and for adjusting the duplicate acknowledgement threshold or
retransmit timeout parameters.  Once this is done, there is no
fundamental reason why TCP congestion control cannot perform
effectively in an environment with persistent reordering.




From owner-tcp-impl@lerc.nasa.gov  Sun Jun 11 18:03:46 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA09861
	for <tcpimpl-archive@odin.ietf.org>; Sun, 11 Jun 2000 18:03:46 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA22782
	for tcp-impl-outgoing; Sun, 11 Jun 2000 15:19:45 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id PAA22778
	for <tcp-impl@grc.nasa.gov>; Sun, 11 Jun 2000 15:19:44 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA07040; Sun, 11 Jun 2000 15:19:44 -0400 (EDT)
Message-Id: <200006111919.PAA07040@seraph3.lerc.nasa.gov>
Received: from be.be.com(208.243.144.2) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma007037; Sun, 11 Jun 00 15:19:43 -0400
Received: (qmail 14868 invoked from network); 11 Jun 2000 19:29:32 -0000
Received: from be.be.com (HELO c225894-b.be.com) (10.0.0.2)
  by mail.be.com with SMTP; 11 Jun 2000 19:29:32 -0000
To: "Eric A. Hall" <ehall@ehsco.com>
Subject: Re: network device and tcp-flow packet ordering
Cc: tcp-impl@grc.nasa.gov
Date: Sun, 11 Jun 2000 12:13:09 GMT
From: "Howard Berkey" <howard@be.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Reply-To: howard@be.com
X-Mailer: BeOS Mail
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

>Such a product would likely end up becoming self-selective in its 
market.
>Maybe technically the product can reorder data all it wants, but the 
market
>won't want a product that does it.

With good reason.



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 03:38:42 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA26638
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 03:38:42 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id BAA11844
	for tcp-impl-outgoing; Mon, 12 Jun 2000 01:06:21 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id BAA11832
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 01:06:19 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id BAA27729; Mon, 12 Jun 2000 01:06:18 -0400 (EDT)
Received: from info.iet.unipi.it(131.114.9.184) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma027597; Mon, 12 Jun 00 01:05:39 -0400
Received: (from luigi@localhost)
	by info.iet.unipi.it (8.9.3/8.9.3) id HAA13642;
	Mon, 12 Jun 2000 07:06:54 +0200 (CEST)
	(envelope-from luigi)
From: Luigi Rizzo <luigi@info.iet.unipi.it>
Message-Id: <200006120506.HAA13642@info.iet.unipi.it>
Subject: Re: network device and tcp-flow packet ordering
In-Reply-To: <200006111635.JAA55096@elk.aciri.org> from Sally Floyd at "Jun 11,
 2000 09:35:21 am"
To: Sally Floyd <floyd@aciri.org>
Date: Mon, 12 Jun 2000 07:06:54 +0200 (CEST)
CC: sankar ramamoorthi <sanka2g@yahoo.com>, tcp-impl@grc.nasa.gov
X-Mailer: ELM [version 2.4ME+ PL61 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

Overnight thinking on TCP and reordering.

Given the following:
    * the receiver knows there is reordering;
    * reordering will occur in less than 1RTT;
    * the receiver can compute the RTT
    * we already tolerate delayed ACKs
    * the first duplicate ACKs will not trigger any new data transmission
      (maybe excet with SACK, i am a bit out-of-date on this)
wouldn't it make sense for the receiver who sees out-of-sequence
delivery of packets to withold the ACKs for such packets until
a) the hole is filled, or b) a small timeout has elapsed (this can be
of the same order of the delayed ack timer, or half the RTT if we have
a local estimate available).

The only drawback i see is some burstiness in the flow, as the ACKs
which would reach the sender would liberate a number of packets
(this could be a problem as reordering might not occur at the
bottleneck), and of course one should use byte-counting rather than
ACK-counting to open the window (this is easy and has been documented
by someone, right ?)

	cheers
	luigi

-----------------------------------+-------------------------------------
  Luigi RIZZO, luigi@iet.unipi.it  . Dip. di Ing. dell'Informazione
  http://www.iet.unipi.it/~luigi/  . Universita` di Pisa
  TEL/FAX: +39-050-568.533/522     . via Diotisalvi 2, 56126 PISA (Italy)
  Mobile   +39-347-0373137
-----------------------------------+-------------------------------------
> I think it is fairly clear that currently, TCP gives abysmal
> performance in the presence of significant reordering.  (When the
> TCP receiver receives out-of-order packets, the TCP receiver sends
> duplicate acknowledgements to tell the TCP sender.  The TCP sender
> then does a Fast Retransmit, retransmitting the packet presumed to
> be lost, and cutting the congestion window at least in half.  This
> is true of Tahoe, Reno, NewReno, and SACK TCP, and, I presume, of
> any TCP implementation more recent that 1988.)
> 
> I believe that the first step in making TCP more robust to reordering
> is in the D-SACK (duplicate-SACK) extension to SACK, "An Extension
> to the Selective Acknowledgement (SACK) Option for TCP",
> "http://search.ietf.org/internet-drafts/draft-floyd-sack-00.txt".
> This has already been approved by the IESG for Proposed Standard,
> and is on the RFC editor's to-do queue.
> 
> I have a draft paper, "A Report on Some Recent Developments in TCP
> Congestion Control", that discusses how the D-SACK option could be
> used to make TCP more robust to reordering.  I am appending an
> excerpt from that paper below.  As the excerpt makes clear, there
> is a significant amount of work that would have to be done to take
> the information in the D-SACK option and come out with viable,
> tested algorithms that allow TCP to be robust to persistent
> reordering...
> 
> - Sally
> --------------------------------
> http://www.aciri.org/floyd/
> --------------------------------
> 
> >From "A Report on Some Recent Developments in TCP
> Congestion Control":
> 
> An initial step towards adding robustness in the presence of
> unnecessary Retransmit Timeouts and Fast Retransmits is to give
> the TCP sender the information to determine when an unnecessary
> Retransmit Timeout or Fast Retransmit has occurred..  This first
> step has been accomplished with the D-SACK (for duplicate-SACK)
> extension \cite{FMMPR99} that has recently been added to the SACK
> TCP option.  The D-SACK extension allows the TCP data receiver to
> use the SACK option to report the receipt of duplicate segments.
> With the use of D-SACK, the TCP sender can correctly infer the 
> segments that have been received by the data receiver, including
> duplicate segments. 
> 
> When the sender has retransmitted a packet, D-SACK does not allow
> TCP to distinguish between the receipt at the receiver of both the
> original and retransmitted packet, and the receipt of two copies  
> of the retransmitted packet, one of which was duplicated in the
> network.  If necessary, TCP's timestamp option could be used to
> distinguish between these two cases \cite{AP99,L99}.  However, in
> an environment with minimal packet replication in the network,
> D-SACK allows the TCP sender to make reasonable inferences, one
> round-trip time after a packet has been retransmitted, about whether
> the retransmission was necessary or unnecessary.
>     
> If the TCP data sender determines, a round-trip time after
> retransmitting a packet, that the receiver received two copies of
> that segment and therefore that the packet retransmission was most
> likely unnecessary, then the sender could have the option of
> ``undoing'' the halving in the congestion window.  The sender can
> ``undo'' the recent halving of the congestion window by increasing
> the Slow-Start threshold ssthresh to the previous value of the old
> congestion window, effectively slow-starting until the congestion
> window has reached its old value.  In addition to restoring the
> congestion window, the TCP sender could adjust the duplicate
> acknowledgement threshold or the retransmit timeout parameters, to
> avoid the wasted bandwidth of persistent unnecessary retransmits.
> 
> The first part of this work, providing the information to the sender
> about duplicate packets received at the receiver, is done with the
> D-SACK extension.  The next step is to evaluate specific mechanisms
> for identifying an unnecessary halving of the congestion window,
> and for adjusting the duplicate acknowledgement threshold or
> retransmit timeout parameters.  Once this is done, there is no
> fundamental reason why TCP congestion control cannot perform
> effectively in an environment with persistent reordering.
> 
> 
> 



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 04:11:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id EAA26869
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 04:11:29 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id BAA14437
	for tcp-impl-outgoing; Mon, 12 Jun 2000 01:37:49 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id BAA14431
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 01:37:48 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id BAA00734; Mon, 12 Jun 2000 01:37:48 -0400 (EDT)
Received: from daffy.ee.lbl.gov(131.243.1.31) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma000726; Mon, 12 Jun 00 01:37:46 -0400
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.10.0/8.10.0) id e5C5bc713048;
	Sun, 11 Jun 2000 22:37:38 -0700 (PDT)
Message-Id: <200006120537.e5C5bc713048@daffy.ee.lbl.gov>
To: Luigi Rizzo <luigi@info.iet.unipi.it>
Cc: Sally Floyd <floyd@aciri.org>, sankar ramamoorthi <sanka2g@yahoo.com>,
        tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering
In-reply-to: Your message of Mon, 12 Jun 2000 07:06:54 PDT.
Date: Sun, 11 Jun 2000 22:37:38 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> wouldn't it make sense for the receiver who sees out-of-sequence
> delivery of packets to withold the ACKs for such packets until
> a) the hole is filled, or b) a small timeout has elapsed (this can be
> of the same order of the delayed ack timer, or half the RTT if we have
> a local estimate available).

I evaluated one form of this in my packet dynamics paper:

	"End-to-End Internet Packet Dynamics", V. Paxson,
	IEEE/ACM Transactions on Networking, 7(3) pp 277-292, June 1999.

	ftp://ftp.ee.lbl.gov/papers/vp-pkt-dyn-ton99.ps.gz

The finding was that if the receiver would wait 20 msec before sending a
second duplicate ack, then you could lower the duplicate ack threshold from
3 down to 2, gaining 65% more fast retransmit opportunities, at almost no
increase in false retransmits.  It also works for the *sender* to wait
20 msec before entering fast retransmission on the 2nd dup ack, and in
that case you only need to deploy the change on the sender side.

		Vern


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 05:50:23 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id FAA27249
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 05:50:23 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id CAA17221
	for tcp-impl-outgoing; Mon, 12 Jun 2000 02:55:51 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id CAA17208
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 02:55:49 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id CAA08347; Mon, 12 Jun 2000 02:55:48 -0400 (EDT)
Received: from info.iet.unipi.it(131.114.9.184) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma008322; Mon, 12 Jun 00 02:55:17 -0400
Received: (from luigi@localhost)
	by info.iet.unipi.it (8.9.3/8.9.3) id IAA13873;
	Mon, 12 Jun 2000 08:56:24 +0200 (CEST)
	(envelope-from luigi)
From: Luigi Rizzo <luigi@info.iet.unipi.it>
Message-Id: <200006120656.IAA13873@info.iet.unipi.it>
Subject: Re: network device and tcp-flow packet ordering
In-Reply-To: <200006120537.e5C5bc713048@daffy.ee.lbl.gov> from Vern Paxson at
 "Jun 11, 2000 10:37:38 pm"
To: Vern Paxson <vern@ee.lbl.gov>
Date: Mon, 12 Jun 2000 08:56:24 +0200 (CEST)
CC: Sally Floyd <floyd@aciri.org>, sankar ramamoorthi <sanka2g@yahoo.com>,
        tcp-impl@grc.nasa.gov
X-Mailer: ELM [version 2.4ME+ PL61 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

> > wouldn't it make sense for the receiver who sees out-of-sequence
> > delivery of packets to withold the ACKs for such packets until
> > a) the hole is filled, or b) a small timeout has elapsed (this can be
> > of the same order of the delayed ack timer, or half the RTT if we have
> > a local estimate available).
> 
> I evaluated one form of this in my packet dynamics paper:
...
> The finding was that if the receiver would wait 20 msec before sending a
> second duplicate ack, then you could lower the duplicate ack threshold from

the problem i see is that the delay should be tuned to the
bottleneck bandwidth -- i.e. it should be larger than the transmit
time for a packet or two. For MSS packets, 20ms requires at least
600Kbit/s per link on the bottleneck, so even if you have reordering
near the source (e.g. this one is using a bunch of ethernets) and then
the bottleneck is a slow modem line, you are out of luck.

That is why i think you have to make this delay comparable to the RTT.

> increase in false retransmits.  It also works for the *sender* to wait
> 20 msec before entering fast retransmission on the 2nd dup ack, and in
> that case you only need to deploy the change on the sender side.

that's an interesting observation, because the sender knows the
RTT already (not accurately, though). So the heuristic could be something
like "do a fast retransmit and assume congestion after N dupacks
_or_ 1 RTT from the last in-sequence ack, whichever comes first" ?

	cheers
	luigi
-----------------------------------+-------------------------------------
  Luigi RIZZO, luigi@iet.unipi.it  . Dip. di Ing. dell'Informazione
  http://www.iet.unipi.it/~luigi/  . Universita` di Pisa
  TEL/FAX: +39-050-568.533/522     . via Diotisalvi 2, 56126 PISA (Italy)
  Mobile   +39-347-0373137
-----------------------------------+-------------------------------------
> 		Vern
> 



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 06:13:43 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA27380
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 06:13:43 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id DAA21501
	for tcp-impl-outgoing; Mon, 12 Jun 2000 03:49:50 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id DAA21486
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 03:49:48 -0400 (EDT)
From: Mika.Liljeberg@nokia.com
Received: by seraph3.lerc.nasa.gov; id DAA13798; Mon, 12 Jun 2000 03:49:48 -0400 (EDT)
Received: from mgw-x2.nokia.com(131.228.20.22) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma013780; Mon, 12 Jun 00 03:49:38 -0400
Received: from mgw-i2.ntc.nokia.com (mgw-i2.ntc.nokia.com [131.228.118.61])
	by mgw-x2.nokia.com (8.9.3/8.9.3/o) with ESMTP id KAA27004;
	Mon, 12 Jun 2000 10:49:34 +0300 (EETDST)
Received: from esebh01nok.ntc.nokia.com (esebh01nok.ntc.nokia.com [131.228.118.150])
	by mgw-i2.ntc.nokia.com (8.9.3/8.9.3) with ESMTP id KAA11236;
	Mon, 12 Jun 2000 10:49:33 +0300 (EETDST)
Received: by esebh01nok with Internet Mail Service (5.5.2650.10)
	id <MM1AXLLW>; Mon, 12 Jun 2000 10:49:33 +0300
Message-ID: <593F7F3472A5D211B99B0008C7EAA08A034A85B5@eseis02nok>
To: sanka2g@yahoo.com, tcp-impl@grc.nasa.gov
Subject: RE: network device and tcp-flow packet ordering
Date: Mon, 12 Jun 2000 10:49:31 +0300
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.10)
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by lombok-fi.lerc.nasa.gov id DAA21492
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 8bit

Hi,

The PILC group draft "Advice for Internet Subnetwork Designers"
<draft-ietf-pilc-link-design-02.txt> has the following passage
about packet reordering in subnetworks:

Packet Reordering
   The Internet architecture does not guarantee that packets will arrive
   in the same order in which they were originally transmitted, and
   transport protocols like TCP must take this into account.  However,
   we recommend that subnetworks not gratuitously deliver packets out of
   sequence.  Since TCP returns a cumulative acknowledgment (ACK)
   indicating the last in-order segment that has arrived, out-of-order
   segments cause a TCP receiver to transmit a duplicate acknowledgment.
   When the TCP sender notices three duplicate acknowledgments it
   assumes that a segment was dropped by the network and uses the fast
   retransmit algorithm [Jac90,APS99] to resend the segment.  In
   addition, the congestion window is reduced by half, effectively
   halving TCP's sending rate.  If a subnetwork badly re-orders segments
   such that three duplicate ACKs are generated the TCP sender
   needlessly reduces the congestion window, and therefore performance.

MikaL

> -----Original Message-----
> From: EXT sankar ramamoorthi [mailto:sanka2g@yahoo.com]
> Sent: 10. kesäkuuta 2000 6:13
> To: tcp-impl@grc.nasa.gov
> Subject: network device and tcp-flow packet ordering
> 
> 
> Hi,
> 
> Is there a reference in any rfc's which says
> to the equivalent of 
> 'Thou shall not reorder packets in a flow' by a
> network device like router or a switch.
> 
> I am trying to make a case that a parralel
> network-device (a device with a lot of parallel
> engines inside it) should make effort to ensure that
> packets inside a tcp flow are not reordered - but am
> losing the argument. Without any effot the device has
> the potential to reorder packets inside tcp flows
> because of the parallism.
> 
> Also what would be the effect of such reordering on
> TCP timers? What is the effect on SACK?
> 
> Any input on this point is welcome.
> 
> Thanks,
> 
> -- sankar ramamoorthi --
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Photos -- now, 100 FREE prints!
> http://photos.yahoo.com
> 


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 06:23:55 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA27426
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 06:23:54 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id DAA21746
	for tcp-impl-outgoing; Mon, 12 Jun 2000 03:53:35 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id DAA21742
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 03:53:34 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id DAA14184; Mon, 12 Jun 2000 03:53:33 -0400 (EDT)
Received: from daffy.ee.lbl.gov(131.243.1.31) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma014174; Mon, 12 Jun 00 03:53:15 -0400
Received: (from vern@localhost)
	by daffy.ee.lbl.gov (8.10.0/8.10.0) id e5C7rCC13513;
	Mon, 12 Jun 2000 00:53:12 -0700 (PDT)
Message-Id: <200006120753.e5C7rCC13513@daffy.ee.lbl.gov>
To: Luigi Rizzo <luigi@info.iet.unipi.it>
Cc: Sally Floyd <floyd@aciri.org>, sankar ramamoorthi <sanka2g@yahoo.com>,
        tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering
In-reply-to: Your message of Mon, 12 Jun 2000 08:56:24 PDT.
Date: Mon, 12 Jun 2000 00:53:12 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> the problem i see is that the delay should be tuned to the
> bottleneck bandwidth -- i.e. it should be larger than the transmit
> time for a packet or two.

The paper discusses this a bit.  Basically, yes, if you're worrying about
single packet reorderings.  But if you are only affected by double-packet
reorderings (which is the case for lowering the threshold to 2), then this
is much less of a problem, because it requires a very large reordering
elsewhere in the network (one large enough that a packet is delayed by
more than the link's propagation time for the packet):

	\footnote{ However, as noted above, some network paths have
	substantial \emph{minimum} reordering times.  For today's
	slower-rate paths, these times can well exceed the $20$~msec figure
	we have explored.  For such paths prone to reordering, we would
	expect any approach based on delaying $\wait = 20$~msec to lead to
	significant, unnecessary retransmissions, and poor performance.
	This problem is considerably diminished for $N_d = 2$ because then
	we must have quite substantial (in terms of time) reordering in
	order to generate enough dups to falsely trigger fast
	retransmission.}

> that's an interesting observation, because the sender knows the
> RTT already (not accurately, though). So the heuristic could be something
> like "do a fast retransmit and assume congestion after N dupacks
> _or_ 1 RTT from the last in-sequence ack, whichever comes first" ?

If I'm wrong about the above, then perhaps; but I think you can indeed
get away with just waiting for 20 msec.

		Vern


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 09:37:04 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA01773
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 09:37:04 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id HAA29720
	for tcp-impl-outgoing; Mon, 12 Jun 2000 07:04:51 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id HAA29711
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 07:04:49 -0400 (EDT)
From: Mika.Liljeberg@nokia.com
Received: by seraph3.lerc.nasa.gov; id HAA02228; Mon, 12 Jun 2000 07:04:49 -0400 (EDT)
Received: from mgw-x2.nokia.com(131.228.20.22) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma002171; Mon, 12 Jun 00 07:04:04 -0400
Received: from mgw-i2.ntc.nokia.com (mgw-i2.ntc.nokia.com [131.228.118.61])
	by mgw-x2.nokia.com (8.9.3/8.9.3/o) with ESMTP id OAA18360;
	Mon, 12 Jun 2000 14:04:00 +0300 (EETDST)
Received: from esebh01nok.ntc.nokia.com (esebh01nok.ntc.nokia.com [131.228.118.150])
	by mgw-i2.ntc.nokia.com (8.9.3/8.9.3) with ESMTP id OAA05569;
	Mon, 12 Jun 2000 14:03:59 +0300 (EETDST)
Received: by esebh01nok with Internet Mail Service (5.5.2650.10)
	id <MM1AX63X>; Mon, 12 Jun 2000 14:03:58 +0300
Message-ID: <593F7F3472A5D211B99B0008C7EAA08A034A85B7@eseis02nok>
To: luigi@info.iet.unipi.it, vern@ee.lbl.gov
Cc: floyd@aciri.org, sanka2g@yahoo.com, tcp-impl@grc.nasa.gov
Subject: RE: network device and tcp-flow packet ordering
Date: Mon, 12 Jun 2000 14:01:24 +0300
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.10)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hi,

I am thinking that a sender could treat retransmission timout
as a fast retransmit if it has received at least one dupack
in the meantime. If there are no dupacks, congestion is the
probable cause for the timeout. However, if the sender is
receiving dupacks something must be getting through. The sender
could still do a normal fast retransmit on 3rd dupack as usual.

This would be a very simple change to existing implementations
and, assuming reasonable window sizes, would probably increase
fast retransmit opportunities considerably. No hard data on this
though.

	MikaL


> > increase in false retransmits.  It also works for the 
> *sender* to wait
> > 20 msec before entering fast retransmission on the 2nd dup 
> ack, and in
> > that case you only need to deploy the change on the sender side.
> 
> that's an interesting observation, because the sender knows the
> RTT already (not accurately, though). So the heuristic could 
> be something
> like "do a fast retransmit and assume congestion after N dupacks
> _or_ 1 RTT from the last in-sequence ack, whichever comes first" ?
> 
> 	cheers
> 	luigi
> -----------------------------------+--------------------------
> -----------
>   Luigi RIZZO, luigi@iet.unipi.it  . Dip. di Ing. dell'Informazione
>   http://www.iet.unipi.it/~luigi/  . Universita` di Pisa
>   TEL/FAX: +39-050-568.533/522     . via Diotisalvi 2, 56126 
> PISA (Italy)
>   Mobile   +39-347-0373137
> -----------------------------------+--------------------------
> -----------
> > 		Vern
> > 
> 


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 13:21:38 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA07309
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 13:21:38 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id KAA03763
	for tcp-impl-outgoing; Mon, 12 Jun 2000 10:52:26 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id KAA03700
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 10:52:22 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id KAA03902; Mon, 12 Jun 2000 10:52:17 -0400 (EDT)
Received: from prue.eim.surrey.ac.uk(131.227.76.5) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma003762; Mon, 12 Jun 00 10:51:33 -0400
Received: from petra.ee.surrey.ac.uk ([131.227.88.13] ident=eep1lw)
	by prue.eim.surrey.ac.uk with esmtp (Exim 3.03 #1)
	id 131VYh-0004jg-00; Mon, 12 Jun 2000 15:51:23 +0100
Date: Mon, 12 Jun 2000 15:51:20 +0100 (BST)
From: Lloyd Wood <l.wood@eim.surrey.ac.uk>
X-Sender: eep1lw@petra.ee.surrey.ac.uk
Reply-To: L.Wood@eim.surrey.ac.uk
To: "Eric A. Hall" <ehall@ehsco.com>
cc: tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering
In-Reply-To: <000a01bfd373$4672ed60$2c071fd1@NTRG.com>
Message-ID: <Pine.GSO.4.21.0006121501230.19478-100000@petra.ee.surrey.ac.uk>
Organization: speaking for none
X-url: http://www.ee.surrey.ac.uk/Personal/L.Wood/
X-no-archive: yes
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

On Sun, 11 Jun 2000, Eric A. Hall wrote:

> In this case though it
> seems to be a given that a parrallel switch will most likely cause a
> significant amount of reordering to occur naturally which is an unplanned
> phenomenon for TCP.

yes.

> But it also probably depends on the end-to-end bandwidth, too. Reordering
> probably won't be a problem on end-to-end connections that are only a
> fraction of the bandwidth, since they won't get into the parrallel switch
> fast enough to get out-of-ordered by the switch (unless the feeds are more
> than a fraction, which is not the scenario). 

unfortunately, TCP is extremely bursty in how it sends segments
back-to-back (especially as it opens a window). As a result of this,
reordering will still occur when the end-to-end connection occupies a
tiny fraction of the available bandwidth.

(This could be avoided by widely implementing rate-based pacing in
 TCP, but rate-based pacing has been shown to suffer when used against
 the more-aggressive traditionally bursty TCP [*], so there's no
 incentive for deploying that. We're stuck at a local maximum on the
 surface of TCP performance.)

[*] http://www.cs.washington.edu/homes/savage/
    Understanding the Performance of TCP Pacing,
    Amit Aggarwal, Stefan Savage, and Tom Anderson,
    Proceedings of the 2000 IEEE Infocom Conference, Tel-Aviv,
    Israel, March, 2000.


> Such a product would likely end up becoming self-selective in its market.
> Maybe technically the product can reorder data all it wants, but the market
> won't want a product that does it.

if reordering resulting from parallelism leads to a product with
faster overall throughput, there's some incentive to deploy the
product and then related improvements to TCP.

(people buy computers based on MHz ratings; they'll buy networking
 hardware based on Gbps throughput. What the actual performance
 _seen_ turns out to be is almost secondary...)

L. 

<L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/>




From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 14:19:34 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA08460
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 14:19:34 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA19386
	for tcp-impl-outgoing; Mon, 12 Jun 2000 12:17:28 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA19334
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 12:17:19 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id MAA19676; Mon, 12 Jun 2000 12:17:18 -0400 (EDT)
Received: from postal.redback.com(155.53.12.9) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma019646; Mon, 12 Jun 00 12:17:13 -0400
Received: from green.redback.com (green.redback.com [155.53.36.109])
	by postal.redback.com (Postfix) with ESMTP
	id 70AFF2AA1B; Mon, 12 Jun 2000 09:17:08 -0700 (PDT)
Received: from green.redback.com by green.redback.com (8.9.3) id JAA00491; Mon, 12 Jun 2000 09:15:28 -0700 (PDT)
Message-Id: <200006121615.JAA00491@green.redback.com>
X-Mailer: exmh version 2.1.0 09/18/1999
To: sankar ramamoorthi <sanka2g@yahoo.com>
Cc: tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering 
In-Reply-To: Your message of "Fri, 09 Jun 2000 20:12:56 PDT."
             <20000610031256.29184.qmail@web2004.mail.yahoo.com> 
Mime-Version: 1.0
Content-Type: multipart/signed; boundary="==_Exmh_106348896P";
	 micalg=pgp-sha1; protocol="application/pgp-signature"
Content-Transfer-Encoding: 7bit
Date: Mon, 12 Jun 2000 09:15:28 -0700
From: Greg Minshall <minshall@redback.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

--==_Exmh_106348896P
Content-Type: text/plain; charset=us-ascii

let me just mention to all and sundry that *independent* of fast retransmit, 
etc., re-ordering is a code path in a TCP that is more expensive than the 
normal "in-order" delivery path.  for me, this is the fundamental reason to 
try to avoid re-ordering in the network.

notice, i did not say "eliminate"; rather, "try to avoid" (as in "not 
gratuitously deliver packets out of sequence", as in the PILC link design 
draft quoted).  by doing *some* work in the net, we can improve the situation 
for the end system, and that is worth doing.  (though, in the original query 
with a parallel network device, "some work" may be a fair amount of work in 
that single system; it would be nice, though, if the idiosyncrasies of a given 
implementation didn't permeate through the net out to end systems.)


--==_Exmh_106348896P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: uFSdrqqabCrJtSqGYwD7eZ5VnNdtJL/+

iQA/AwUBOUUMoG1GBZxTyU5lEQL0NACgoMnzLsB1xTppx0RZYmQfHpFTk78An1fO
f1vcmPGe7TvQFqIHkTXFi2Is
=EfIo
-----END PGP SIGNATURE-----

--==_Exmh_106348896P--


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 14:30:39 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA08598
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 14:30:39 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA20747
	for tcp-impl-outgoing; Mon, 12 Jun 2000 12:27:06 -0400 (EDT)
Received: from guns (guns.lerc.nasa.gov [139.88.87.35])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA20691;
	Mon, 12 Jun 2000 12:26:34 -0400 (EDT)
Message-Id: <200006121626.MAA20691@lombok-fi.lerc.nasa.gov>
To: Luigi Rizzo <luigi@info.iet.unipi.it>
From: Mark Allman <mallman@grc.nasa.gov>
Reply-To: mallman@grc.nasa.gov
cc: Sally Floyd <floyd@aciri.org>, sankar ramamoorthi <sanka2g@yahoo.com>,
        tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering 
Organization: Late Night Hackers, NASA Glenn, Cleveland, Ohio
Song-of-the-Day: Back in the USSR
Date: Mon, 12 Jun 2000 12:26:34 -0400
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


> bottleneck), and of course one should use byte-counting rather
> than ACK-counting to open the window (this is easy and has been
> documented by someone, right ?)

Yep:

    Mark Allman. On the Generation and Use of TCP
    Acknowledgments. ACM Computer Communication Review, 28(5),
    October 1998.  
    http://roland.grc.nasa.gov/~mallman/papers/acks.ps

    Mark Allman. TCP Byte Counting Refinements. ACM Computer
    Communication Review, 29(3), July 1999. 
    http://roland.grc.nasa.gov/~mallman/papers/bc-ccr.ps

allman


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 16:04:35 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA10162
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 16:04:35 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id NAA02786
	for tcp-impl-outgoing; Mon, 12 Jun 2000 13:45:19 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id NAA02734
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 13:45:15 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id NAA05186; Mon, 12 Jun 2000 13:45:13 -0400 (EDT)
Received: from elk.aciri.org(192.150.187.21) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma005088; Mon, 12 Jun 00 13:44:31 -0400
Received: from elk.aciri.org (localhost [127.0.0.1])
	by elk.aciri.org (8.9.3/8.9.3) with ESMTP id KAA65040;
	Mon, 12 Jun 2000 10:44:23 -0700 (PDT)
	(envelope-from floyd@elk.aciri.org)
Message-Id: <200006121744.KAA65040@elk.aciri.org>
To: Luigi Rizzo <luigi@info.iet.unipi.it>
cc: tcp-impl@grc.nasa.gov
From: Sally Floyd <floyd@aciri.org>
Subject: Re: network device and tcp-flow packet ordering 
Date: Mon, 12 Jun 2000 10:44:23 -0700
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Luigi -

>Overnight thinking on TCP and reordering.

...
>wouldn't it make sense for the receiver who sees out-of-sequence
>delivery of packets to withold the ACKs for such packets until
>a) the hole is filled, or b) a small timeout has elapsed (this can be
>of the same order of the delayed ack timer, or half the RTT if we have
>a local estimate available).

I think that the Limited Transmit approach in
draft-allman-tcp-lossrec-00.txt makes more sense, of the receiver
sending dup ACKs, and the sender responding to the first and second
dup ACKs by sending new packets.  To my mind, this is closer to what
I would see as the ideal behavior, of the receiver's advertised
window being used to prevent overflow of the receiver's buffer,
and the congestion window being used to control the number of
packets outstanding in the pipe.

And if small timeouts turn out to be valuable before inferring a
packet loss from dup ACKs, then the sender could do that as well
as the receiver (assuming sufficient resources on the sending
machine).

- Sally
--------------------------------
http://www.aciri.org/floyd/
--------------------------------


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 16:08:05 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA10209
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 16:08:04 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id NAA02172
	for tcp-impl-outgoing; Mon, 12 Jun 2000 13:43:47 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id NAA02126
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 13:43:44 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id NAA04900; Mon, 12 Jun 2000 13:43:43 -0400 (EDT)
Received: from smtprch1.nortelnetworks.com(192.135.215.14) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma004857; Mon, 12 Jun 00 13:43:27 -0400
Received: from zcard00m.ca.nortel.com (actually zcard00m) 
          by smtprch1.nortel.com; Mon, 12 Jun 2000 12:42:42 -0500
Received: from zcard00p.ca.nortel.com ([47.141.0.104]) 
          by zcard00m.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id MPQ1ZKMQ; Mon, 12 Jun 2000 13:43:07 -0400
Received: from pcard38c.ca.nortel.com ([47.23.82.29]) by zcard00p.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id LQHBMQYJ; Mon, 12 Jun 2000 13:43:04 -0400
Date: Mon, 12 Jun 2000 12:58:45 -0400 (EDT)
X-Sybari-Space: 00000000 00000000 00000000
From: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
X-Sender: hadi@PCARD38C.ca.nortel.com
Reply-To: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
To: Lloyd Wood <l.wood@eim.surrey.ac.uk>
cc: "Eric A. Hall" <ehall@ehsco.com>, tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering
In-Reply-To: <Pine.GSO.4.21.0006121501230.19478-100000@petra.ee.surrey.ac.uk>
Message-ID: <Pine.LNX.4.21.0006121253410.14319-100000@PCARD38C.ca.nortel.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


On Mon, 12 Jun 2000, Lloyd Wood wrote:

> if reordering resulting from parallelism leads to a product with
> faster overall throughput, there's some incentive to deploy the
> product and then related improvements to TCP.
> 
> (people buy computers based on MHz ratings; they'll buy networking
>  hardware based on Gbps throughput. What the actual performance
>  _seen_ turns out to be is almost secondary...)

Well said -- I concur. 

So this goes to the original poster:

If you can do it cheaply and get higher throughput, dont worry about
packet re-ordering. End systems should fix that. 
Of course, i realize the above statement sounds like blasphemy. 
Unfortunately it is reality. People are building parallelized path
switches; and if you think this is bad wait until "content
switching" hardware hits the market. 

cheers,
jamal



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 16:54:49 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA11026
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 16:54:44 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA12018
	for tcp-impl-outgoing; Mon, 12 Jun 2000 14:40:50 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id OAA11977
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 14:40:47 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id OAA14863; Mon, 12 Jun 2000 14:40:46 -0400 (EDT)
Received: from lightning.swansea.uk.linux.org(194.168.151.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma014816; Mon, 12 Jun 00 14:40:21 -0400
Received: from alan by the-village.bc.nu with local (Exim 2.12 #1)
	id 131Z4d-0004pg-00; Mon, 12 Jun 2000 19:36:35 +0100
Subject: Re: network device and tcp-flow packet ordering
To: hadi@nortelnetworks.com
Date: Mon, 12 Jun 2000 19:36:32 +0100 (BST)
Cc: l.wood@eim.surrey.ac.uk (Lloyd Wood), ehall@ehsco.com (Eric A. Hall),
        tcp-impl@grc.nasa.gov
In-Reply-To: <Pine.LNX.4.21.0006121253410.14319-100000@PCARD38C.ca.nortel.com> from "Jamal Hadi Salim" at Jun 12, 2000 12:58:45 PM
X-Mailer: ELM [version 2.5 PL1]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <E131Z4d-0004pg-00@the-village.bc.nu>
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

> >  hardware based on Gbps throughput. What the actual performance
> >  _seen_ turns out to be is almost secondary...)
> 
> Well said -- I concur. 

People by lines by bandwidth. So we'll see people selling 'reordering packet
line optimisers' for your routers. It would probaly make a nice little Linux
netfilter project in fact 8)

> If you can do it cheaply and get higher throughput, dont worry about
> packet re-ordering. End systems should fix that. 
> Of course, i realize the above statement sounds like blasphemy. 
> Unfortunately it is reality. People are building parallelized path
> switches; and if you think this is bad wait until "content
> switching" hardware hits the market. 

Long term you are probably right. There is a limit to the rate you can get
bits down a set of wires and order them sanely the other end. However the
changeover is likely to be very messy since a few of these boxes doing
say 2% re-ordering turns the packet stream to mush by the time its passed
through 3 or 4.

I imagine the sudden doubling of network use at most ISP's will encourage them
to switch back. Especially when their customers complain about poor 
performance - but yes in the end it has to be solved.

Alan



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 17:41:28 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA11735
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 17:41:28 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA17943
	for tcp-impl-outgoing; Mon, 12 Jun 2000 15:18:20 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id PAA17904
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 15:18:17 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA21061; Mon, 12 Jun 2000 15:18:17 -0400 (EDT)
Received: from aland.bbn.com(204.162.9.10) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma020858; Mon, 12 Jun 00 15:17:33 -0400
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id PAA11931;
	Mon, 12 Jun 2000 15:17:29 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Message-Id: <200006121917.PAA11931@aland.bbn.com>
To: L.Wood@eim.surrey.ac.uk
cc: tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering 
In-reply-to: Your message of "Mon, 12 Jun 2000 15:51:20 BST."
             <Pine.GSO.4.21.0006121501230.19478-100000@petra.ee.surrey.ac.uk> 
Date: Mon, 12 Jun 2000 15:17:29 -0400
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


In message <Pine.GSO.4.21.0006121501230.19478-100000@petra.ee.surrey.ac.uk>, Ll
oyd Wood writes:

>(This could be avoided by widely implementing rate-based pacing in
> TCP, but rate-based pacing has been shown to suffer when used against
> the more-aggressive traditionally bursty TCP [*], so there's no
> incentive for deploying that. We're stuck at a local maximum on the
> surface of TCP performance.)
>
>[*] http://www.cs.washington.edu/homes/savage/
>    Understanding the Performance of TCP Pacing,
>    Amit Aggarwal, Stefan Savage, and Tom Anderson,
>    Proceedings of the 2000 IEEE Infocom Conference, Tel-Aviv,
>    Israel, March, 2000.

I'll be the first to say that I haven't fully asorbed the results of
this particular paper (having just skimmed it briefly) but we didn't see
signs of these problems in the simulation work we've done (published at
GLOBECOM at the same time Amit published his other TCP pacing paper).

See http://mimas.lcs.mit.edu/~jokulik/tcppacing.html for details.

We've now got a working tcp pacing implementation and are starting real
world tests, so perhaps we'll know more soon (I'll also make a note
to cross compare our results with U. Wash).

Thanks!

Craig


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 18:42:30 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA12605
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 18:42:29 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id QAA26293
	for tcp-impl-outgoing; Mon, 12 Jun 2000 16:13:09 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id QAA26263
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 16:13:06 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id QAA00536; Mon, 12 Jun 2000 16:13:05 -0400 (EDT)
Received: from aland.bbn.com(204.162.9.10) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma000474; Mon, 12 Jun 00 16:12:52 -0400
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id QAA12089;
	Mon, 12 Jun 2000 16:12:38 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Message-Id: <200006122012.QAA12089@aland.bbn.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
cc: tcp-impl@grc.nasa.gov
Subject: Re: network device and tcp-flow packet ordering 
In-reply-to: Your message of "Mon, 12 Jun 2000 19:36:32 BST."
             <E131Z4d-0004pg-00@the-village.bc.nu> 
Date: Mon, 12 Jun 2000 16:12:37 -0400
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


In message <E131Z4d-0004pg-00@the-village.bc.nu>, Alan Cox writes:

>Long term you are probably right. There is a limit to the rate you can get
>bits down a set of wires and order them sanely the other end.

This is not true.  There are some very good algorithms, some published,
some not (e.g. in the patent process) that give you full or very near
full utilization of the parallel links but retain ordering.

The best known public version is the WASHU algorithm (Varghese was one
of the authors) published, I believe, in IEEE/ACM Trans. on Networking
a few years ago.

Another algorithm, which takes a very different approach from the WASHU
algorithm and has lower complexity, is the subject of a patent I co-authored
and I'm told will be issued shortly.

Craig


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 18:42:36 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA12616
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 18:42:33 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id QAA25661
	for tcp-impl-outgoing; Mon, 12 Jun 2000 16:08:40 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id QAA25634
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 16:08:37 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id QAA29742; Mon, 12 Jun 2000 16:08:35 -0400 (EDT)
Received: from sj-msg-core-1.cisco.com(171.71.163.11) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma029701; Mon, 12 Jun 00 16:08:14 -0400
Received: from wooly-booly.cisco.com (wooly-booly.cisco.com [171.69.167.33])
	by sj-msg-core-1.cisco.com (8.9.3/8.9.1) with ESMTP id NAA18972;
	Mon, 12 Jun 2000 13:08:22 -0700 (PDT)
Received: from p7020-img-nt.cisco.com (fred-hm-dhcp1.cisco.com [171.69.128.116]) by wooly-booly.cisco.com (8.8.8-Cisco List Logging/CISCO.WS.1.2) with ESMTP id PAA19883; Mon, 12 Jun 2000 15:08:09 -0500 (CDT)
Message-Id: <4.3.2.7.2.20000612130103.00e26340@flipper.cisco.com>
X-Sender: fred@flipper.cisco.com
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Mon, 12 Jun 2000 13:01:25 -0700
To: Vernon Schryver <vjs@calcite.rhyolite.com>
From: Fred Baker <fred@cisco.com>
Subject: Re: network device and tcp-flow packet ordering
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: <200006101606.KAA11405@calcite.rhyolite.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

At 10:06 AM 6/10/00 -0600, Vernon Schryver wrote:
>I suspect there are similar real life reasons why Cisco tries so hard to
>maintain associations between flows and individual routes when load
>balancing amoung routes.

hmmm. Yup, that thought has occurred to us.



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 18:54:23 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA12706
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 18:54:22 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id QAA27020
	for tcp-impl-outgoing; Mon, 12 Jun 2000 16:16:54 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id QAA26994
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 16:16:51 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id QAA01372; Mon, 12 Jun 2000 16:16:50 -0400 (EDT)
Received: from sj-msg-core-1.cisco.com(171.71.163.11) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma001343; Mon, 12 Jun 00 16:16:49 -0400
Received: from wooly-booly.cisco.com (wooly-booly.cisco.com [171.69.167.33])
	by sj-msg-core-1.cisco.com (8.9.3/8.9.1) with ESMTP id NAA25753;
	Mon, 12 Jun 2000 13:16:57 -0700 (PDT)
Received: from p7020-img-nt.cisco.com (fred-hm-dhcp1.cisco.com [171.69.128.116]) by wooly-booly.cisco.com (8.8.8-Cisco List Logging/CISCO.WS.1.2) with ESMTP id PAA19899; Mon, 12 Jun 2000 15:16:44 -0500 (CDT)
Message-Id: <4.3.2.7.2.20000612130325.00e374b0@flipper.cisco.com>
X-Sender: fred@flipper.cisco.com
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Mon, 12 Jun 2000 13:11:04 -0700
To: Luigi Rizzo <luigi@info.iet.unipi.it>
From: Fred Baker <fred@cisco.com>
Subject: Re: network device and tcp-flow packet ordering
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: <200006120506.HAA13642@info.iet.unipi.it>
References: <200006111635.JAA55096@elk.aciri.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

At 07:06 AM 6/12/00 +0200, Luigi Rizzo wrote:
>Given the following:
>     * the receiver knows there is reordering;

the problem is that the receiver doesn't necessarily know that. It may know 
that some packet previous in this session had been reordered, but it 
generally has no visibility of other sessions on the same machine (which 
may use different routes) or sessions to different machines. On the first 
reordered packet in a session, it knows only that the message it has 
received is not the next one it expected to receive. The discrepancy may be 
due to reordering, and it may be due to loss. In the latter case, you'd 
like to recover as quickly as you could.

>       (maybe excet with SACK, i am a bit out-of-date on this)
>wouldn't it make sense for the receiver who sees out-of-sequence
>delivery of packets to withold the ACKs for such packets until
>a) the hole is filled, or b) a small timeout has elapsed (this can be
>of the same order of the delayed ack timer, or half the RTT if we have
>a local estimate available).

Maybe. An issue arises with delaying the Ack in the presence of a 
measurement of RTT - the delay is itself part of the RTT. So I would argue 
that only *these* Acks want to be delayed - only the Acks that might 
trigger a retransmission. And when you do send them, it would be either to 
acknowledge the now-in-order segment when it arrived and subsequently to 
trigger further transmissions (would rather not see a burst) or to trigger 
the retransmission, so in either case you actually want the specific Ack, 
not a collected Ack of several segments.

I tend to think Sally's approach, being explicit rather than heuristic, is 
a better one.



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 19:21:56 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA12907
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 19:21:55 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id RAA05108
	for tcp-impl-outgoing; Mon, 12 Jun 2000 17:07:13 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id RAA05070
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 17:07:11 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id RAA10017; Mon, 12 Jun 2000 17:07:07 -0400 (EDT)
Received: from info.iet.unipi.it(131.114.9.184) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma009949; Mon, 12 Jun 00 17:06:29 -0400
Received: (from luigi@localhost)
	by info.iet.unipi.it (8.9.3/8.9.3) id XAA04208;
	Mon, 12 Jun 2000 23:07:57 +0200 (CEST)
	(envelope-from luigi)
From: Luigi Rizzo <luigi@info.iet.unipi.it>
Message-Id: <200006122107.XAA04208@info.iet.unipi.it>
Subject: Re: network device and tcp-flow packet ordering
In-Reply-To: <4.3.2.7.2.20000612130325.00e374b0@flipper.cisco.com> from Fred
 Baker at "Jun 12, 2000 01:11:04 pm"
To: Fred Baker <fred@cisco.com>
Date: Mon, 12 Jun 2000 23:07:57 +0200 (CEST)
CC: tcp-impl@grc.nasa.gov
X-Mailer: ELM [version 2.4ME+ PL61 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

> At 07:06 AM 6/12/00 +0200, Luigi Rizzo wrote:
> >Given the following:
> >     * the receiver knows there is reordering;
> 
> the problem is that the receiver doesn't necessarily know that. It may know 
...
> received is not the next one it expected to receive. The discrepancy may be 
> due to reordering, and it may be due to loss. In the latter case, you'd 
> like to recover as quickly as you could.
...
> I tend to think Sally's approach, being explicit rather than heuristic, is 
> a better one.

but as you say, the receiver doesn't necessarily know that there
is reordering, nor does the sender, so either you wait or you have
to make your bet :) So my idea woud be that you start by assuming
that there is no reordering, but when you see reordering once, you
become biased on the other one.

For those arguing that delaying ACKs is detrimental to RTT measurements:
yes, it is, but we have been living with delayed acks triggered by
applications reads for >10 years now...

	cheers
	luigi
-----------------------------------+-------------------------------------
  Luigi RIZZO, luigi@iet.unipi.it  . Dip. di Ing. dell'Informazione
  http://www.iet.unipi.it/~luigi/  . Universita` di Pisa
  TEL/FAX: +39-050-568.533/522     . via Diotisalvi 2, 56126 PISA (Italy)
  Mobile   +39-347-0373137
-----------------------------------+-------------------------------------


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 19:23:20 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA12918
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 19:23:19 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id RAA05479
	for tcp-impl-outgoing; Mon, 12 Jun 2000 17:10:10 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id RAA05458
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 17:10:08 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id RAA10539; Mon, 12 Jun 2000 17:10:07 -0400 (EDT)
Message-Id: <200006122110.RAA10539@seraph3.lerc.nasa.gov>
Received: from be.be.com(208.243.144.2) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma010458; Mon, 12 Jun 00 17:09:43 -0400
Received: (qmail 3807 invoked from network); 12 Jun 2000 21:19:35 -0000
Received: from be.be.com (HELO c225894-b.be.com) (10.0.0.2)
  by mail.be.com with SMTP; 12 Jun 2000 21:19:35 -0000
To: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
Subject: Re: network device and tcp-flow packet ordering
Cc: l.wood@eim.surrey.ac.uk, ehall@ehsco.com, tcp-impl@grc.nasa.gov
Date: Mon, 12 Jun 2000 14:02:57 GMT
From: "Howard Berkey" <howard@be.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Reply-To: howard@be.com
X-Mailer: BeOS Mail
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Of course, another way to look at it is that routers that don't make a 
best effort to maintain packet order will be noticed (and shunned) if 
they cause a drop in overall throughput on today's TCP implementations.


>
>On Mon, 12 Jun 2000, Lloyd Wood wrote:
>
>> if reordering resulting from parallelism leads to a product with
>> faster overall throughput, there's some incentive to deploy the
>> product and then related improvements to TCP.
>> 
>> (people buy computers based on MHz ratings; they'll buy networking
>>  hardware based on Gbps throughput. What the actual performance
>>  _seen_ turns out to be is almost secondary...)
>
>Well said -- I concur. 
>
>So this goes to the original poster:
>
>If you can do it cheaply and get higher throughput, dont worry about
>packet re-ordering. End systems should fix that. 
>Of course, i realize the above statement sounds like blasphemy. 
>Unfortunately it is reality. People are building parallelized path
>switches; and if you think this is bad wait until "content
>switching" hardware hits the market. 
>
>cheers,
>jamal
>
>


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 12 19:30:27 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA12941
	for <tcpimpl-archive@odin.ietf.org>; Mon, 12 Jun 2000 19:30:27 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id RAA05263
	for tcp-impl-outgoing; Mon, 12 Jun 2000 17:08:43 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id RAA05215
	for <tcp-impl@grc.nasa.gov>; Mon, 12 Jun 2000 17:08:38 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id RAA10293; Mon, 12 Jun 2000 17:08:37 -0400 (EDT)
Received: from info.iet.unipi.it(131.114.9.184) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma010248; Mon, 12 Jun 00 17:08:25 -0400
Received: (from luigi@localhost)
	by info.iet.unipi.it (8.9.3/8.9.3) id XAA04262;
	Mon, 12 Jun 2000 23:09:59 +0200 (CEST)
	(envelope-from luigi)
From: Luigi Rizzo <luigi@info.iet.unipi.it>
Message-Id: <200006122109.XAA04262@info.iet.unipi.it>
Subject: Re: network device and tcp-flow packet ordering
In-Reply-To: <200006121744.KAA65040@elk.aciri.org> from Sally Floyd at "Jun 12,
 2000 10:44:23 am"
To: Sally Floyd <floyd@aciri.org>
Date: Mon, 12 Jun 2000 23:09:59 +0200 (CEST)
CC: tcp-impl@grc.nasa.gov
X-Mailer: ELM [version 2.4ME+ PL61 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

> I think that the Limited Transmit approach in
> draft-allman-tcp-lossrec-00.txt makes more sense, of the receiver
> sending dup ACKs, and the sender responding to the first and second
> dup ACKs by sending new packets.  To my mind, this is closer to what

agreed.

	cheers
	luigi
-----------------------------------+-------------------------------------
  Luigi RIZZO, luigi@iet.unipi.it  . Dip. di Ing. dell'Informazione
  http://www.iet.unipi.it/~luigi/  . Universita` di Pisa
  TEL/FAX: +39-050-568.533/522     . via Diotisalvi 2, 56126 PISA (Italy)
  Mobile   +39-347-0373137
-----------------------------------+-------------------------------------


From owner-tcp-impl@lerc.nasa.gov  Wed Jun 14 02:24:09 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id CAA13190
	for <tcpimpl-archive@odin.ietf.org>; Wed, 14 Jun 2000 02:24:09 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id XAA17083
	for tcp-impl-outgoing; Tue, 13 Jun 2000 23:34:38 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id XAA17075
	for <tcp-impl@grc.nasa.gov>; Tue, 13 Jun 2000 23:34:36 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id XAA06124; Tue, 13 Jun 2000 23:34:35 -0400 (EDT)
Received: from f95.law8.hotmail.com(216.33.241.95) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma006113; Tue, 13 Jun 00 23:34:31 -0400
Received: (qmail 96737 invoked by uid 0); 14 Jun 2000 03:34:30 -0000
Message-ID: <20000614033430.96736.qmail@hotmail.com>
Received: from 149.149.39.112 by www.hotmail.com with HTTP;
	Tue, 13 Jun 2000 20:34:30 PDT
X-Originating-IP: [149.149.39.112]
From: "Srinivas Kurla" <kurla@hotmail.com>
To: tcp-impl@grc.nasa.gov
Subject: Re: Tracing TCP's cwnd
Date: Tue, 13 Jun 2000 22:34:30 CDT
Mime-Version: 1.0
Content-Type: text/plain; format=flowed
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

hi,
  could anyone please tell me how to make use of the ns-2.1b6 file 
../tcl/test/misc_source.tcl to trace TCP's congestion window.
I am using this file as a source file to other file 
(../tcl/ex/test-suite.tcl) but unable to get the required output.
thanking you,

truly,
Srinivas Kurla
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



From owner-tcp-impl@lerc.nasa.gov  Wed Jun 14 23:08:57 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id XAA10114
	for <tcpimpl-archive@odin.ietf.org>; Wed, 14 Jun 2000 23:08:56 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id UAA25086
	for tcp-impl-outgoing; Wed, 14 Jun 2000 20:43:55 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id UAA25048
	for <tcp-impl@grc.nasa.gov>; Wed, 14 Jun 2000 20:43:52 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id UAA27693; Wed, 14 Jun 2000 20:43:52 -0400 (EDT)
Received: from cpl-mail1.cpl.novell.com(147.2.71.20) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma027611; Wed, 14 Jun 00 20:43:07 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by cpl-mail1.cpl.novell.com; Thu, 15 Jun 2000 02:42:23 +0200
Message-ID: <3948266F.D0C6A7E7@Novell.COM>
Date: Wed, 14 Jun 2000 18:42:23 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.72 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: TCP-IMPL <tcp-impl@grc.nasa.gov>
Subject: RTO calcuation in NetBSD/FreeBSD
Content-Type: multipart/mixed;
 boundary="------------ACD0F4E695E926E49B1E6E69"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------ACD0F4E695E926E49B1E6E69
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

It appears that NetBSD and FreeBSD's RTO calculation (from SRTT and
RTTVAR) is different from the Van Jacbson's paper and the code described
in TCP/IP Illustrated vol2.

TCP/IP illustrated vol 2 has the calculation for delta as:

delta = rtt - 1 - (tp->t_srtt >> TCP_RTT_SHIFT);

where TCP_RTT_SHIFT is 3.

whereas NetBSD/FreeBSD have an equivalent of:

delta = ((rtt - 1) << 2 ) - (t_srtt >> 2);

Are the reasons for this change explained in detail in some paper or
something like that?

Thanks,

S.R.
--------------ACD0F4E695E926E49B1E6E69
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------ACD0F4E695E926E49B1E6E69--



From owner-tcp-impl@lerc.nasa.gov  Thu Jun 15 09:38:09 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA01525
	for <tcpimpl-archive@odin.ietf.org>; Thu, 15 Jun 2000 09:38:09 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id GAA23902
	for tcp-impl-outgoing; Thu, 15 Jun 2000 06:41:43 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id GAA23887
	for <tcp-impl@grc.nasa.gov>; Thu, 15 Jun 2000 06:41:41 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id GAA28552; Thu, 15 Jun 2000 06:41:41 -0400 (EDT)
Received: from odin.ietf.org(132.151.1.176) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma028527; Thu, 15 Jun 00 06:41:07 -0400
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA26173;
	Thu, 15 Jun 2000 06:41:05 -0400 (EDT)
Message-Id: <200006151041.GAA26173@ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;
Cc: tcp-impl@grc.nasa.gov
From: Internet-Drafts@ietf.org
Reply-to: Internet-Drafts@ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-pmtud-04.txt
Date: Thu, 15 Jun 2000 06:41:05 -0400
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: TCP Problems with Path MTU Discovery
	Author(s)	: K. Lahey
	Filename	: draft-ietf-tcpimpl-pmtud-04.txt
	Pages		: 16
	Date		: 14-Jun-00
	
This memo catalogs several known TCP implementation problems dealing
with Path MTU Discovery [RFC1191], including the long-standing black
hole problem, stretch ACKs due to confusion between MSS and segment
size, and MSS advertisement based on PMTU.  The goal in doing so is
to improve conditions in the existing Internet by enhancing the
quality of current TCP/IP implementations.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-tcpimpl-pmtud-04.txt

Internet-Drafts are also available by anonymous FTP. Login with the username
"anonymous" and a password of your e-mail address. After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-pmtud-04.txt".

A list of Internet-Drafts directories can be found in
http://www.ietf.org/shadow.html 
or ftp://ftp.ietf.org/ietf/1shadow-sites.txt


Internet-Drafts can also be obtained by e-mail.

Send a message to:
	mailserv@ietf.org.
In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-pmtud-04.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<20000614110640.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-pmtud-04.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-pmtud-04.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<20000614110640.I-D@ietf.org>

--OtherAccess--

--NextPart--




From owner-tcp-impl@lerc.nasa.gov  Thu Jun 15 12:04:14 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA06086
	for <tcpimpl-archive@odin.ietf.org>; Thu, 15 Jun 2000 12:04:13 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id JAA14327
	for tcp-impl-outgoing; Thu, 15 Jun 2000 09:36:34 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id JAA14307
	for <tcp-impl@grc.nasa.gov>; Thu, 15 Jun 2000 09:36:31 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id JAA20655; Thu, 15 Jun 2000 09:36:30 -0400 (EDT)
Received: from prv-mail21.provo.novell.com(137.65.81.126) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma020607; Thu, 15 Jun 00 09:35:58 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail21.provo.novell.com; Thu, 15 Jun 2000 07:34:21 -0600
Message-ID: <3948DB59.5BEAA29C@Novell.COM>
Date: Thu, 15 Jun 2000 07:34:17 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.72 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: TCP-IMPL <tcp-impl@grc.nasa.gov>
Subject: Re: RTO calcuation in NetBSD/FreeBSD
References: <3948266F.D0C6A7E7@Novell.COM>
Content-Type: multipart/mixed;
 boundary="------------36589E58B74B1250E6AC4079"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------36589E58B74B1250E6AC4079
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I guess I found the answer :-#. The final computation for RTO:

RTO = t_srtt/8 + t_rttvar

in the TCP/IP Illustrated vol2 has now become in NetBSD and FreeBSD:

RTO = (t_srtt/8 + t_rttvar)/4

This would probably offset the initial multiplication by 4 of the RTT.

Thanks,

S.R.

Ramesh Shankar wrote:
> 
> It appears that NetBSD and FreeBSD's RTO calculation (from SRTT and
> RTTVAR) is different from the Van Jacbson's paper and the code described
> in TCP/IP Illustrated vol2.
> 
> TCP/IP illustrated vol 2 has the calculation for delta as:
> 
> delta = rtt - 1 - (tp->t_srtt >> TCP_RTT_SHIFT);
> 
> where TCP_RTT_SHIFT is 3.
> 
> whereas NetBSD/FreeBSD have an equivalent of:
> 
> delta = ((rtt - 1) << 2 ) - (t_srtt >> 2);
> 
> Are the reasons for this change explained in detail in some paper or
> something like that?
> 
> Thanks,
> 
> S.R.
--------------36589E58B74B1250E6AC4079
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------36589E58B74B1250E6AC4079--



From owner-tcp-impl@lerc.nasa.gov  Thu Jun 15 22:16:33 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA21043
	for <tcpimpl-archive@odin.ietf.org>; Thu, 15 Jun 2000 22:16:33 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA08393
	for tcp-impl-outgoing; Thu, 15 Jun 2000 19:34:31 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id TAA08386
	for <tcp-impl@grc.nasa.gov>; Thu, 15 Jun 2000 19:34:29 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id TAA26258; Thu, 15 Jun 2000 19:34:29 -0400 (EDT)
Received: from prv-mail25.provo.novell.com(137.65.81.121) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma026235; Thu, 15 Jun 00 19:34:01 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail25.provo.novell.com; Thu, 15 Jun 2000 17:33:40 -0600
Message-ID: <394967D7.9613F0F1@Novell.COM>
Date: Thu, 15 Jun 2000 17:33:43 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.72 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: TCP-IMPL <tcp-impl@grc.nasa.gov>
Subject: Re: RTO calcuation in NetBSD/FreeBSD
References: <3948266F.D0C6A7E7@Novell.COM> <3948DB59.5BEAA29C@Novell.COM>
Content-Type: multipart/mixed;
 boundary="------------D5F81DBE21EE4D52AB3B9995"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------D5F81DBE21EE4D52AB3B9995
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Some more info:

The modified FreeBSD/NetBSD implementation is described in Brakmo and
Peterson's paper, "Performance problems in BSD4.4 TCP". I read it long
back and probably forgot about it. Just in case anyone is interested ...

Thanks,

S.R.

Ramesh Shankar wrote:
> 
> I guess I found the answer :-#. The final computation for RTO:
> 
> RTO = t_srtt/8 + t_rttvar
> 
> in the TCP/IP Illustrated vol2 has now become in NetBSD and FreeBSD:
> 
> RTO = (t_srtt/8 + t_rttvar)/4
> 
> This would probably offset the initial multiplication by 4 of the RTT.
> 
> Thanks,
> 
> S.R.
> 
> Ramesh Shankar wrote:
> >
> > It appears that NetBSD and FreeBSD's RTO calculation (from SRTT and
> > RTTVAR) is different from the Van Jacbson's paper and the code described
> > in TCP/IP Illustrated vol2.
> >
> > TCP/IP illustrated vol 2 has the calculation for delta as:
> >
> > delta = rtt - 1 - (tp->t_srtt >> TCP_RTT_SHIFT);
> >
> > where TCP_RTT_SHIFT is 3.
> >
> > whereas NetBSD/FreeBSD have an equivalent of:
> >
> > delta = ((rtt - 1) << 2 ) - (t_srtt >> 2);
> >
> > Are the reasons for this change explained in detail in some paper or
> > something like that?
> >
> > Thanks,
> >
> > S.R.
--------------D5F81DBE21EE4D52AB3B9995
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------D5F81DBE21EE4D52AB3B9995--



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 00:34:55 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA24130
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 00:34:54 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id WAA15662
	for tcp-impl-outgoing; Thu, 15 Jun 2000 22:06:03 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id WAA15636
	for <tcp-impl@grc.nasa.gov>; Thu, 15 Jun 2000 22:06:01 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id WAA13176; Thu, 15 Jun 2000 22:06:01 -0400 (EDT)
Message-Id: <200006160206.WAA13176@seraph3.lerc.nasa.gov>
Received: from ertpg14e1.nortelnetworks.com(47.234.0.35) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma013040; Thu, 15 Jun 00 22:05:34 -0400
Received: from zcard00n.ca.nortel.com (actually zcard00n) 
          by ertpg14e1.nortelnetworks.com; Thu, 15 Jun 2000 22:04:27 -0400
Received: from zcard00f.ca.nortel.com ([47.129.30.8]) by zcard00n.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id M0V63619; Thu, 15 Jun 2000 22:04:25 -0400
Received: from zcarh014 (zcarh014.ca.nortel.com [47.23.81.6]) 
          by zcard00f.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id L5HL8S15; Thu, 15 Jun 2000 22:04:25 -0400
Date: Thu, 15 Jun 2000 22:04:05 -0400 (EDT)
X-Sybari-Space: 00000000 00000000 00000000
From: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
Reply-To: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
Subject: Intentional Host Reordering of TCP Fragments
To: tcp-impl@grc.nasa.gov
X-Mailer: Rosa 2.1 HP-UXB.10.20
X-Rosa-Trace: nseddigh@zcarh014 <47.23.81.6>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-ID: <Rosa.HP-UXB.10..2.1.1000615220405.14319X@zcarh014>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by lombok-fi.lerc.nasa.gov id WAA15644
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 8bit

Recent discussions on this list focused on network
devices reordering packets. How about an end-host
that intentionally reorders some TCP packets?

During testing of Packet Classification capability
on for Diffserv-capable routers, one of my colleagues
discovered that the Linux TCP sender intentionally
reorders fragments - the pkt containing the TCP port
number is the last pkt sent in a family (same ip_id)
of fragments.

The implication of such an implementation is that
Layer-4 packet classification of fragments 
is virtually impossible - unless routers try to 
cache packets....not a desirable solution.

Of course one could argue that most fragments on
the net are UDP and not TCP but nonetheless:
shouldn't end-hosts avoid intentional reordering?

We checked out various RFCs and didn't find explicit 
prohibitions on an end-host reordering the pkts 
that it sends out. 

Best,
---
Nabil Seddigh
nseddigh@nortelnetworks.com





From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 00:59:03 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA24294
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 00:59:03 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id WAA16513
	for tcp-impl-outgoing; Thu, 15 Jun 2000 22:23:18 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id WAA16507
	for <tcp-impl@grc.nasa.gov>; Thu, 15 Jun 2000 22:23:16 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id WAA14902; Thu, 15 Jun 2000 22:23:15 -0400 (EDT)
Received: from r94aag002979.sbo-smr.ma.cable.rcn.com(209.6.183.136) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma014891; Thu, 15 Jun 00 22:23:09 -0400
Received: (from mycroft@localhost)
	by lop-nor.ihack.net (8.10.1/8.8.8) id e5G2J9v00206;
	Thu, 15 Jun 2000 22:19:09 -0400 (EDT)
Date: Thu, 15 Jun 2000 22:19:09 -0400 (EDT)
Message-Id: <200006160219.e5G2J9v00206@lop-nor.ihack.net>
X-Authentication-Warning: lop-nor.ihack.net: mycroft set sender to root@ihack.net using -f
From: "Charles M. Hannum" <root@ihack.net>
To: Ramesh Shankar <RShankar@novell.com>
Cc: tcp-impl@grc.nasa.gov
Subject: Re: RTO calcuation in NetBSD/FreeBSD
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


Just as a point of information:

I did indeed make this change in NetBSD based on the B&P paper.  (I
believe that was the first time anyone had actually used any of their
suggestions in a production system.  It's amazing how people utterly
fail to read the papers that are published...)  We do NOT implement
the rest of `TCP Vegas', however.



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 06:50:19 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA08541
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 06:50:19 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id EAA04572
	for tcp-impl-outgoing; Fri, 16 Jun 2000 04:04:51 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id EAA04555
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 04:04:49 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id EAA20146; Fri, 16 Jun 2000 04:04:49 -0400 (EDT)
Received: from ren.netconnect.com.au(203.7.198.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma020127; Fri, 16 Jun 00 04:04:33 -0400
Received: (qmail 18287 invoked from network); 16 Jun 2000 08:04:32 -0000
Received: from unknown (HELO cvs.com.au) (203.87.14.203)
  by mail.netconnect.com.au with SMTP; 16 Jun 2000 08:04:32 -0000
Message-ID: <3949A10F.F1847DD1@cvs.com.au>
Date: Fri, 16 Jun 2000 13:37:51 +1000
From: Charles Esson <charlese@cvs.com.au>
X-Mailer: Mozilla 4.5 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
CC: tcp-impl@grc.nasa.gov
Subject: Re: Intentional Host Reordering of TCP Fragments
References: <200006160206.WAA13176@seraph3.lerc.nasa.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

<rant>
Doesn't Defragmentation belongs in the IP layer; if you make that
assumption, then you could make life easier for that layer by sending
the last fragment first; the receiver then knows what resources are
required to reassemble.

As an outsider looking in; it would seem to me TCP/IP started out a very
elegant general purpose protocol. It allowed for out of order delivery;
multiple packet delivery and made no assumption on the packet size that
the underlying medium could handle. It has become more and more complex;
and less and less robust ( complexity brings bugs) as people try to get
additional speed out of systems that don't need a protocol that is as
robust as TCP/IP was.

Fast retransmit for example making assumptions that are just not valid
in the general case.

I am also a little confused as too why out of order packet delivery is
an issue with the ever decreasing cost of processors. It for example
costs very little to have a NIC reassemble fragmented IP packets before
delivery to the general purpose CPU.

Granted sorting out the IP order is a little more complex, but even
there if rules where made to have the IP identifier indicate the order
it would not have been a big deal; the NIC could have handled it; but
that was not to be.

If you have to move to the TCP sequence number to sort out packet order
then surely all is lost.
1) TCP is one of many possible protocols handled by IP.
2) You are using in a low layer something that has nothing to do with
that layer, more complexity more bugs.

If  you send packets in parallel surly it would be better to add a
sequence number  when dividing the input stream up, then use and remove
that sequence number when putting it back together, assuming you have
control over how the stream is put back together.

Given the way the US patent system is going it is probable something as
simple and obvious as this that is about to be patented. Perhaps this is
today's solution, but not tomorrow's.

If you haven't got control of reassemble, then I would have thought that
was the very application IP was initially designed to handle, and the
bits added as time has passed are the problem.
</rant>




Nabil Seddigh wrote:

> Recent discussions on this list focused on network
> devices reordering packets. How about an end-host
> that intentionally reorders some TCP packets?
>
> During testing of Packet Classification capability
> on for Diffserv-capable routers, one of my colleagues
> discovered that the Linux TCP sender intentionally
> reorders fragments - the pkt containing the TCP port
> number is the last pkt sent in a family (same ip_id)
> of fragments.
>
> The implication of such an implementation is that
> Layer-4 packet classification of fragments
> is virtually impossible - unless routers try to
> cache packets....not a desirable solution.
>
> Of course one could argue that most fragments on
> the net are UDP and not TCP but nonetheless:
> shouldn't end-hosts avoid intentional reordering?
>
> We checked out various RFCs and didn't find explicit
> prohibitions on an end-host reordering the pkts
> that it sends out.
>
> Best,
> ---
> Nabil Seddigh
> nseddigh@nortelnetworks.com



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 08:39:25 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA10682
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 08:39:24 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id GAA10136
	for tcp-impl-outgoing; Fri, 16 Jun 2000 06:17:37 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id GAA10132
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 06:17:36 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id GAA03878; Fri, 16 Jun 2000 06:17:36 -0400 (EDT)
Received: from pop.atlas.cz(195.119.187.150) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma003867; Fri, 16 Jun 00 06:17:18 -0400
Received: from mail pickup service by relay.atlas.cz with Microsoft SMTPSVC;
	 Fri, 16 Jun 2000 09:18:54 +0200
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33]) by relay.atlas.cz  with Microsoft SMTPSVC(5.5.1877.357.35);
	 Wed, 14 Jun 2000 10:25:39 +0200
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id XAA17083
	for tcp-impl-outgoing; Tue, 13 Jun 2000 23:34:38 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id XAA17075
	for <tcp-impl@grc.nasa.gov>; Tue, 13 Jun 2000 23:34:36 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id XAA06124; Tue, 13 Jun 2000 23:34:35 -0400 (EDT)
Received: from f95.law8.hotmail.com(216.33.241.95) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma006113; Tue, 13 Jun 00 23:34:31 -0400
Received: (qmail 96737 invoked by uid 0); 14 Jun 2000 03:34:30 -0000
Message-ID: <20000614033430.96736.qmail@hotmail.com>
Received: from 149.149.39.112 by www.hotmail.com with HTTP;
	Tue, 13 Jun 2000 20:34:30 PDT
X-Originating-IP: [149.149.39.112]
From: "Srinivas Kurla" <kurla@hotmail.com>
To: tcp-impl@grc.nasa.gov
Subject: Re: Tracing TCP's cwnd
Date: Tue, 13 Jun 2000 22:34:30 CDT
Mime-Version: 1.0
Content-Type: text/plain; format=flowed
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

hi,
  could anyone please tell me how to make use of the ns-2.1b6 file 
../tcl/test/misc_source.tcl to trace TCP's congestion window.
I am using this file as a source file to other file 
(../tcl/ex/test-suite.tcl) but unable to get the required output.
thanking you,

truly,
Srinivas Kurla
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 10:27:30 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA12713
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 10:27:30 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id HAA14825
	for tcp-impl-outgoing; Fri, 16 Jun 2000 07:32:40 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id HAA14811
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 07:32:38 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id HAA11662; Fri, 16 Jun 2000 07:32:36 -0400 (EDT)
Received: from ertpg14e1.nortelnetworks.com(47.234.0.35) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma011560; Fri, 16 Jun 00 07:32:07 -0400
Received: from zcard00n.ca.nortel.com (actually zcard00n) 
          by ertpg14e1.nortelnetworks.com; Fri, 16 Jun 2000 07:30:40 -0400
Received: from zcard00p.ca.nortel.com ([47.141.0.104]) 
          by zcard00n.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id M0V6P91X; Fri, 16 Jun 2000 07:30:39 -0400
Received: from pcard38c.ca.nortel.com ([47.23.82.29]) by zcard00p.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id LQHBNLWA; Fri, 16 Jun 2000 07:30:37 -0400
Date: Fri, 16 Jun 2000 06:47:55 -0400 (EDT)
From: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
X-Sender: hadi@PCARD38C.ca.nortel.com
Reply-To: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
To: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
cc: tcp-impl@grc.nasa.gov
Subject: Re: Intentional Host Reordering of TCP Fragments
In-Reply-To: <200006160206.WAA13176@seraph3.lerc.nasa.gov>
Message-ID: <Pine.LNX.4.21.0006160639330.3408-100000@PCARD38C.ca.nortel.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


On Thu, 15 Jun 2000, Seddigh, Nabil  wrote:

> During testing of Packet Classification capability
> on for Diffserv-capable routers, one of my colleagues
> discovered that the Linux TCP sender intentionally
> reorders fragments - the pkt containing the TCP port
> number is the last pkt sent in a family (same ip_id)
> of fragments.
> 
> The implication of such an implementation is that
> Layer-4 packet classification of fragments 
> is virtually impossible - unless routers try to 
> cache packets....not a desirable solution.
> 
> Of course one could argue that most fragments on
> the net are UDP and not TCP but nonetheless:
> shouldn't end-hosts avoid intentional reordering?
> 

Fix your classifier.

Linux's "intentional packet re-ordering" of fragments is because that
is more efficient for Linux to do (and more natural thing to do for
network ordering of the bytes in the packet).

From a philosophical point of view, why should an end system care about
your classifier?

cheers,
jamal



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 11:36:24 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA14636
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 11:36:24 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id JAA25924
	for tcp-impl-outgoing; Fri, 16 Jun 2000 09:02:40 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id JAA25917
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 09:02:39 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id JAA23378; Fri, 16 Jun 2000 09:02:38 -0400 (EDT)
Received: from prv-mail20.provo.novell.com(137.65.81.122) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma023356; Fri, 16 Jun 00 09:02:26 -0400
Received: from INET-PRV-Message_Server by prv-mail20.provo.novell.com
	with Novell_GroupWise; Fri, 16 Jun 2000 07:02:12 -0600
Message-Id: <s949d0f4.025@prv-mail20.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 5.5.3.1
Date: Fri, 16 Jun 2000 07:01:56 -0600
From: "Narsimharao Nagampalli" <NNARASIMHARAO@novell.com>
To: <charlese@cvs.com.au>
Cc: <tcp-impl@grc.nasa.gov>
Subject: Re: Intentional Host Reordering of TCP Fragments
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by lombok-fi.lerc.nasa.gov id JAA25919
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 8bit

QoS, Server Load Balancing Solutions, Filters  would be other examples where such a re-ordering would be a problem. 

I hope, the flow label in IPv6 will be some thing that will come to the rescue here. If the definition of the flow label can be driven to contain information from which the port numbers can be extracted, it would help L4 switches a lot.

-Narsi

Nabil Seddigh wrote:

> Recent discussions on this list focused on network
> devices reordering packets. How about an end-host
> that intentionally reorders some TCP packets?
>
> During testing of Packet Classification capability
> on for Diffserv-capable routers, one of my colleagues
> discovered that the Linux TCP sender intentionally
> reorders fragments - the pkt containing the TCP port
> number is the last pkt sent in a family (same ip_id)
> of fragments.
>
> The implication of such an implementation is that
> Layer-4 packet classification of fragments
> is virtually impossible - unless routers try to
> cache packets....not a desirable solution.
>
> Of course one could argue that most fragments on
> the net are UDP and not TCP but nonetheless:
> shouldn't end-hosts avoid intentional reordering?
>
> We checked out various RFCs and didn't find explicit
> prohibitions on an end-host reordering the pkts
> that it sends out.
>
> Best,
> ---
> Nabil Seddigh
> nseddigh@nortelnetworks.com 




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 12:04:19 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA15437
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 12:04:18 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id JAA28134
	for tcp-impl-outgoing; Fri, 16 Jun 2000 09:15:26 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id JAA28093
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 09:15:23 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id JAA25162; Fri, 16 Jun 2000 09:15:22 -0400 (EDT)
Received: from aland.bbn.com(204.162.9.10) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma025091; Fri, 16 Jun 00 09:14:39 -0400
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id JAA22742;
	Fri, 16 Jun 2000 09:14:24 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Message-Id: <200006161314.JAA22742@aland.bbn.com>
To: Charles Esson <charlese@cvs.com.au>
cc: tcp-impl@grc.nasa.gov
Subject: Re: Intentional Host Reordering of TCP Fragments 
In-reply-to: Your message of "Fri, 16 Jun 2000 13:37:51 +1000."
             <3949A10F.F1847DD1@cvs.com.au> 
Date: Fri, 16 Jun 2000 09:14:24 -0400
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


In message <3949A10F.F1847DD1@cvs.com.au>, Charles Esson writes:

><rant>

A fun rant and it has lots of things we could argue about.  But there is
one point where I think I can shed some light (rather than just rant back :-)).

>If  you send packets in parallel surly it would be better to add a
>sequence number  when dividing the input stream up, then use and remove
>that sequence number when putting it back together, assuming you have
>control over how the stream is put back together.

The problem is more much more complex than simply ordering.  Suppose we
did this -- just put a sequence number on every packet at the link level
when we went parallel, and then put it back together at the next hop.  (You
can do this analysis for reordering at the end, too, just simpler this way).

OK, so consider the following situation:


Single link A -> Splitter -> 2 parallel links -> Ordering Node -> Single Link B

Further suppose that single link A is very fast (much much faster than the
downstream links).  This assumption just makes the analysis more clear (you
can have A be closer in speed).  Second, assume the rate of Link B is
R, and the rate of the 2 parallel links is R/2 each (so R in aggregate).

Now imagine that you receive a packet stream of the form, one maximum size
packet, called M, followed by 1 minimum size packet, called m.

Assume we put sequence numbers on each packet but that their cost is
negligible.

Now, look at the link performance.

At the splitter, we get M, and start sending it on one of the parallel lines.
Ignoring delay, M will take time (2M/R) to transmit.  Packet m arrives
almost immediately after M (recall link A is fast) and gets sent on the
second line.  Packet m takes time (2m/R).  So packet m arrives first.
Indeed, since m is the minimum packet size and M is the max, the last
bit of m arrives around the time the m-th bit of M arrives.

We want to put packets back in order before sending on link B.  So packet
m has to wait and link B is left idle.  The idle time is the remaining
time for packet M to arrive (which is roughly ((2M-m)/R)).  At that point
we have all of M, and can send M and then m along.

It is useful to figure out how long it takes to send packet m from the time
it enters the splitter until the time it leaves the ordering node.

If the parallel links were one single link, it would take time (M+m)/R to send
m, because it has to wait for M to serialize and then itself.    But because
we've gone parallel AND choose to delay m until M arrives to send in order,
the time is (3M+m)/R.  So it takes roughly THREE times as long to get m
from one end to the other, all because we put sequence numbers on.

The exact cost varies -- the key point here is not so much the time delay
as the fact that the splitter has data (packet m) which is not being sent
while it waits for packet M to arrive.  Unused bits are like unused
hotel room nights: you can't reuse them and they increase the overhead.

That's why most people don't like simple sequence numbers for maintaining
packet order.

Craig


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 13:06:38 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA17310
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 13:06:38 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id KAA06643
	for tcp-impl-outgoing; Fri, 16 Jun 2000 10:12:28 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id KAA06614
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 10:12:25 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id KAA03667; Fri, 16 Jun 2000 10:12:22 -0400 (EDT)
Message-Id: <200006161412.KAA03667@seraph3.lerc.nasa.gov>
Received: from ertpg14e1.nortelnetworks.com(47.234.0.35) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma003602; Fri, 16 Jun 00 10:11:52 -0400
Received: from zcard00n.ca.nortel.com (actually zcard00n) 
          by ertpg14e1.nortelnetworks.com; Fri, 16 Jun 2000 09:58:28 -0400
Received: from zcard00f.ca.nortel.com ([47.129.30.8]) by zcard00n.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id M0V6QVRY; Fri, 16 Jun 2000 09:58:23 -0400
Received: from zcarh014 (zcarh014.ca.nortel.com [47.23.81.6]) 
          by zcard00f.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id L5HL8SYP; Fri, 16 Jun 2000 09:58:23 -0400
Date: Fri, 16 Jun 2000 09:58:02 -0400 (EDT)
X-Sybari-Space: 00000000 00000000 00000000
From: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
Reply-To: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
Subject: Re: Intentional Host Reordering of TCP Fragments
To: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
cc: tcp-impl@grc.nasa.gov
X-Mailer: Rosa 2.1 HP-UXB.10.20
X-Rosa-Trace: nseddigh@zcarh014 <47.23.81.6>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-ID: <Rosa.HP-UXB.10..2.1.1000616095802.14319a@zcarh014>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by lombok-fi.lerc.nasa.gov id KAA06627
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 8bit

Jamal Hadi Salim writes:
>
>Linux's "intentional packet re-ordering" of fragments is because that
>is more efficient for Linux to do 
>

I guess I had already heard this opinion from you ;)
I was also hoping to hear opinions from other folks 
who have been involved in development
of the TCP/IP standards etc....

Seems to me that if network reordering is to be discouraged
then the same principle should apply to end-hosts. Seems
inconsistent to encourage network devices to avoid 
reordering yet allow end-hosts to reorder.

BTW, if you have an algorithm that allows you to do
100% accurate Layer-4 classification (without caching) in 
the face of reordered fragments then I wouldn't mind 
seeing it ;)

Best,
---
Nabil Seddigh
nseddigh@nortelnetworks.com



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 13:25:59 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA17801
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 13:25:58 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id KAA12262
	for tcp-impl-outgoing; Fri, 16 Jun 2000 10:46:16 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id KAA12240
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 10:46:13 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id KAA09175; Fri, 16 Jun 2000 10:46:11 -0400 (EDT)
Received: from ertpg14e1.nortelnetworks.com(47.234.0.35) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma009143; Fri, 16 Jun 00 10:46:07 -0400
Received: from zcard00m.ca.nortel.com (actually zcard00m) 
          by ertpg14e1.nortelnetworks.com; Fri, 16 Jun 2000 10:44:59 -0400
Received: from zcard00p.ca.nortel.com ([47.141.0.104]) 
          by zcard00m.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id M857VGV8; Fri, 16 Jun 2000 10:44:54 -0400
Received: from pcard38c.ca.nortel.com ([47.23.82.29]) by zcard00p.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id LQHBNNTL; Fri, 16 Jun 2000 10:44:53 -0400
Date: Fri, 16 Jun 2000 10:02:11 -0400 (EDT)
X-Sybari-Space: 00000000 00000000 00000000
From: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
X-Sender: hadi@PCARD38C.ca.nortel.com
Reply-To: "Jamal Hadi Salim" <hadi@nortelnetworks.com>
To: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
cc: tcp-impl@grc.nasa.gov, end2end-interest@ISI.EDU
Subject: Re: Intentional Host Reordering of TCP Fragments
In-Reply-To: <200006161325.JAA03630@PCARD38C.ca.nortel.com>
Message-ID: <Pine.LNX.4.21.0006160955110.3640-100000@PCARD38C.ca.nortel.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


On Fri, 16 Jun 2000, Seddigh, Nabil  wrote:

> Jamal Hadi Salim writes:
> >
> >Linux's "intentional packet re-ordering" of fragments is because that
> >is more efficient for Linux to do 
> >
> 
> I guess I had already heard this opinion from you ;)
> I was also hoping to hear opinions from other folks 
> who have been involved in development
> of the TCP/IP standards etc....

Indeed, but: its only fair that other people hear these opinions as
well.

You are doing layer violations and expect systems that conform to
layering to adjust to your sins.
That the philosphical dilema.

> Seems to me that if network reordering is to be discouraged
> then the same principle should apply to end-hosts. Seems
> inconsistent to encourage network devices to avoid 
> reordering yet allow end-hosts to reorder.
 
I think maybe this is going out of the scope of tcp-impl so i am
re-directing it to end2end


cheers,
jamal



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 13:43:04 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id NAA18367
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 13:43:04 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id LAA18670
	for tcp-impl-outgoing; Fri, 16 Jun 2000 11:21:35 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id LAA18612
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 11:21:30 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id LAA15319; Fri, 16 Jun 2000 11:21:29 -0400 (EDT)
Received: from lightning.swansea.uk.linux.org(194.168.151.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma015272; Fri, 16 Jun 00 11:21:10 -0400
Received: from alan by the-village.bc.nu with local (Exim 2.12 #1)
	id 132xsJ-00067X-00; Fri, 16 Jun 2000 16:17:39 +0100
Subject: Re: Intentional Host Reordering of TCP Fragments
To: NNARASIMHARAO@novell.com (Narsimharao Nagampalli)
Date: Fri, 16 Jun 2000 16:17:37 +0100 (BST)
Cc: charlese@cvs.com.au, tcp-impl@grc.nasa.gov
In-Reply-To: <s949d0f4.025@prv-mail20.provo.novell.com> from "Narsimharao Nagampalli" at Jun 16, 2000 07:01:56 AM
X-Mailer: ELM [version 2.5 PL1]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <E132xsJ-00067X-00@the-village.bc.nu>
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

> During testing of Packet Classification capability
> on for Diffserv-capable routers, one of my colleagues
> discovered that the Linux TCP sender intentionally
> reorders fragments - the pkt containing the TCP port
> number is the last pkt sent in a family (same ip_id)
> of fragments.

We don't reorder fragments we send packets in an optimised fashion.
Im suprised nobody else is doing this. Its common sense for any end host
that has to support software tcp checksumming

You cannot send the head of the packet until you know the checksum so we
build the fragments in reverse order. The last few fragments are leaving
the ethernet card before the first fragment and the checksum is computed
its very good for latency and its completely valid tcp/ip.

The only thing it broke was some older versions of the cisco pix and I believe
that is now fixed on the PIX.

Alan



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 14:14:21 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA19505
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 14:14:20 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id LAA18136
	for tcp-impl-outgoing; Fri, 16 Jun 2000 11:18:39 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id LAA18066
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 11:18:34 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id LAA14778; Fri, 16 Jun 2000 11:18:28 -0400 (EDT)
Received: from pop.atlas.cz(195.119.187.150) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma014709; Fri, 16 Jun 00 11:18:04 -0400
Received: from mail pickup service by pop.atlas.cz with Microsoft SMTPSVC;
	 Fri, 16 Jun 2000 16:12:07 +0200
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33]) by pop.atlas.cz  with Microsoft SMTPSVC(5.5.1877.357.35);
	 Sun, 11 Jun 2000 23:16:18 +0200
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA17422
	for tcp-impl-outgoing; Sun, 11 Jun 2000 12:37:01 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA17406
	for <tcp-impl@grc.nasa.gov>; Sun, 11 Jun 2000 12:37:00 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id MAA22506; Sun, 11 Jun 2000 12:36:59 -0400 (EDT)
Received: from elk.aciri.org(192.150.187.21) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma022479; Sun, 11 Jun 00 12:36:25 -0400
Received: from elk.aciri.org (localhost [127.0.0.1])
	by elk.aciri.org (8.9.3/8.9.3) with ESMTP id JAA55096;
	Sun, 11 Jun 2000 09:35:21 -0700 (PDT)
	(envelope-from floyd@elk.aciri.org)
Message-Id: <200006111635.JAA55096@elk.aciri.org>
To: sankar ramamoorthi <sanka2g@yahoo.com>
cc: tcp-impl@grc.nasa.gov
From: Sally Floyd <floyd@aciri.org>
Subject: Re: network device and tcp-flow packet ordering 
Date: Sun, 11 Jun 2000 09:35:21 -0700
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

I think it is fairly clear that currently, TCP gives abysmal
performance in the presence of significant reordering.  (When the
TCP receiver receives out-of-order packets, the TCP receiver sends
duplicate acknowledgements to tell the TCP sender.  The TCP sender
then does a Fast Retransmit, retransmitting the packet presumed to
be lost, and cutting the congestion window at least in half.  This
is true of Tahoe, Reno, NewReno, and SACK TCP, and, I presume, of
any TCP implementation more recent that 1988.)

I believe that the first step in making TCP more robust to reordering
is in the D-SACK (duplicate-SACK) extension to SACK, "An Extension
to the Selective Acknowledgement (SACK) Option for TCP",
"http://search.ietf.org/internet-drafts/draft-floyd-sack-00.txt".
This has already been approved by the IESG for Proposed Standard,
and is on the RFC editor's to-do queue.

I have a draft paper, "A Report on Some Recent Developments in TCP
Congestion Control", that discusses how the D-SACK option could be
used to make TCP more robust to reordering.  I am appending an
excerpt from that paper below.  As the excerpt makes clear, there
is a significant amount of work that would have to be done to take
the information in the D-SACK option and come out with viable,
tested algorithms that allow TCP to be robust to persistent
reordering...

- Sally
--------------------------------
http://www.aciri.org/floyd/
--------------------------------

From "A Report on Some Recent Developments in TCP
Congestion Control":

An initial step towards adding robustness in the presence of
unnecessary Retransmit Timeouts and Fast Retransmits is to give
the TCP sender the information to determine when an unnecessary
Retransmit Timeout or Fast Retransmit has occurred..  This first
step has been accomplished with the D-SACK (for duplicate-SACK)
extension \cite{FMMPR99} that has recently been added to the SACK
TCP option.  The D-SACK extension allows the TCP data receiver to
use the SACK option to report the receipt of duplicate segments.
With the use of D-SACK, the TCP sender can correctly infer the 
segments that have been received by the data receiver, including
duplicate segments. 

When the sender has retransmitted a packet, D-SACK does not allow
TCP to distinguish between the receipt at the receiver of both the
original and retransmitted packet, and the receipt of two copies  
of the retransmitted packet, one of which was duplicated in the
network.  If necessary, TCP's timestamp option could be used to
distinguish between these two cases \cite{AP99,L99}.  However, in
an environment with minimal packet replication in the network,
D-SACK allows the TCP sender to make reasonable inferences, one
round-trip time after a packet has been retransmitted, about whether
the retransmission was necessary or unnecessary.
    
If the TCP data sender determines, a round-trip time after
retransmitting a packet, that the receiver received two copies of
that segment and therefore that the packet retransmission was most
likely unnecessary, then the sender could have the option of
``undoing'' the halving in the congestion window.  The sender can
``undo'' the recent halving of the congestion window by increasing
the Slow-Start threshold ssthresh to the previous value of the old
congestion window, effectively slow-starting until the congestion
window has reached its old value.  In addition to restoring the
congestion window, the TCP sender could adjust the duplicate
acknowledgement threshold or the retransmit timeout parameters, to
avoid the wasted bandwidth of persistent unnecessary retransmits.

The first part of this work, providing the information to the sender
about duplicate packets received at the receiver, is done with the
D-SACK extension.  The next step is to evaluate specific mechanisms
for identifying an unnecessary halving of the congestion window,
and for adjusting the duplicate acknowledgement threshold or
retransmit timeout parameters.  Once this is done, there is no
fundamental reason why TCP congestion control cannot perform
effectively in an environment with persistent reordering.



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 14:34:37 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA20274
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 14:34:36 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id LAA20409
	for tcp-impl-outgoing; Fri, 16 Jun 2000 11:31:19 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id LAA20381
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 11:31:16 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id LAA17158; Fri, 16 Jun 2000 11:31:16 -0400 (EDT)
Received: from pop.atlas.cz(195.119.187.150) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma016718; Fri, 16 Jun 00 11:30:15 -0400
Received: from mail pickup service by pop.atlas.cz with Microsoft SMTPSVC;
	 Fri, 16 Jun 2000 16:26:14 +0200
Received: from relay.atlas.cz ([195.119.187.150]) by pop.atlas.cz  with Microsoft SMTPSVC(5.5.1877.357.35);
	 Mon, 12 Jun 2000 02:19:20 +0200
Received: from lombok-fi.lerc.nasa.gov ([139.88.112.33]) by relay.atlas.cz  with Microsoft SMTPSVC(5.5.1877.357.35);
	 Mon, 12 Jun 2000 01:54:16 +0200
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA22782
	for tcp-impl-outgoing; Sun, 11 Jun 2000 15:19:45 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id PAA22778
	for <tcp-impl@grc.nasa.gov>; Sun, 11 Jun 2000 15:19:44 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA07040; Sun, 11 Jun 2000 15:19:44 -0400 (EDT)
Message-Id: <200006111919.PAA07040@seraph3.lerc.nasa.gov>
Received: from be.be.com(208.243.144.2) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma007037; Sun, 11 Jun 00 15:19:43 -0400
Received: (qmail 14868 invoked from network); 11 Jun 2000 19:29:32 -0000
Received: from be.be.com (HELO c225894-b.be.com) (10.0.0.2)
  by mail.be.com with SMTP; 11 Jun 2000 19:29:32 -0000
To: "Eric A. Hall" <ehall@ehsco.com>
Subject: Re: network device and tcp-flow packet ordering
Cc: tcp-impl@grc.nasa.gov
Date: Sun, 11 Jun 2000 12:13:09 GMT
From: "Howard Berkey" <howard@be.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Reply-To: howard@be.com
X-Mailer: BeOS Mail
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

>Such a product would likely end up becoming self-selective in its 
market.
>Maybe technically the product can reorder data all it wants, but the 
market
>won't want a product that does it.

With good reason.


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 16:03:46 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA22863
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 16:03:46 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id NAA09173
	for tcp-impl-outgoing; Fri, 16 Jun 2000 13:30:09 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id NAA09150
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 13:30:08 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id NAA09722; Fri, 16 Jun 2000 13:30:07 -0400 (EDT)
Received: from palrel1.hp.com(156.153.255.242) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma009644; Fri, 16 Jun 00 13:29:52 -0400
Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.8.80.176])
	by palrel1.hp.com (Postfix) with ESMTP id 5E2CBA9C
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 10:29:50 -0700 (PDT)
Received: from cup.hp.com (raj@localhost [127.0.0.1])
	by tardy.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id KAA15442
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 10:28:39 -0700 (PDT)
Message-ID: <394A63C7.62BD5553@cup.hp.com>
Date: Fri, 16 Jun 2000 10:28:39 -0700
From: Rick Jones <raj@cup.hp.com>
Organization: the Unofficial HP
X-Mailer: Mozilla 4.7 [en] (X11; U; HP-UX B.11.00 9000/785)
X-Accept-Language: en
MIME-Version: 1.0
To: tcp-impl@grc.nasa.gov
Subject: Re: Intentional Host Reordering of TCP Fragments
References: <200006160206.WAA13176@seraph3.lerc.nasa.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit


I thought that end hosts were supposed to avoid fragmenting TCP
datagrams in the first place :)

Sending the first fragment last is not _that_ unknown. For example, it
is done for the HP-PB FDDI NIC from many years ago. In that case it was
done for UDP (thought I guess it would be done for TCP too if TCP
segments were fragmented) - the reason? The card basically had a
checksum accumulator - accumulate the checksum over the relevant
portions of the rest of the fragments and then insert them into the
header at the end. That way one does not have a need to DMA the entire
fragment chain into the NIC before the offloaded checksum can take
place.

Basically, it is a way to do a "trailer checksum" without actually doing
a trailer checksum.

rick jones

Nabil Seddigh wrote:
> 
> Recent discussions on this list focused on network
> devices reordering packets. How about an end-host
> that intentionally reorders some TCP packets?
> 
> During testing of Packet Classification capability
> on for Diffserv-capable routers, one of my colleagues
> discovered that the Linux TCP sender intentionally
> reorders fragments - the pkt containing the TCP port
> number is the last pkt sent in a family (same ip_id)
> of fragments.
> 
> The implication of such an implementation is that
> Layer-4 packet classification of fragments
> is virtually impossible - unless routers try to
> cache packets....not a desirable solution.
> 
> Of course one could argue that most fragments on
> the net are UDP and not TCP but nonetheless:
> shouldn't end-hosts avoid intentional reordering?
> 
> We checked out various RFCs and didn't find explicit
> prohibitions on an end-host reordering the pkts
> that it sends out.
> 
> Best,
> ---
> Nabil Seddigh
> nseddigh@nortelnetworks.com

-- 
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 17:57:44 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA25932
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 17:57:44 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA26112
	for tcp-impl-outgoing; Fri, 16 Jun 2000 15:25:07 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id PAA26073
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 15:25:04 -0400 (EDT)
From: kuznet@ms2.inr.ac.ru
Received: by seraph3.lerc.nasa.gov; id PAA29372; Fri, 16 Jun 2000 15:25:03 -0400 (EDT)
Received: from minus.inr.ac.ru(193.233.7.97) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma029301; Fri, 16 Jun 00 15:24:31 -0400
Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA15245; Fri, 16 Jun 2000 23:24:24 +0400
Message-Id: <200006161924.XAA15245@ms2.inr.ac.ru>
Subject: Re: Intentional Host Reordering of TCP Fragments
To: nseddigh@nortelnetworks.com
Date: Fri, 16 Jun 2000 23:24:24 +0400 (MSK DST)
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: <200006161412.KAA03667@seraph3.lerc.nasa.gov> from "Nabil Seddigh" at Jun 16, 0 09:58:02 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello!

> Seems to me that if network reordering is to be discouraged
> then the same principle should apply to end-hosts. Seems
> inconsistent to encourage network devices to avoid 
> reordering yet allow end-hosts to reorder.

I apologize, the word "reordering" applies to the cases
when some order ever existed. 8) There is no order for fragments,
but that one, which sending host generated.

Change subject line, please.


> BTW, if you have an algorithm that allows you to do
> 100% accurate Layer-4 classification (without caching) 

100% accurate algorithm, which classifies IP packets using keys
contained in payload, does not exist, cannot exist and must not exist.
It is in contradiction with basic principles of IP.

If such pseudo-router really wants to do this, it must:

1. Be sure that no other paths from the source to the destination exist.
2. Gather all the fragments before doing any decisions. The situation,
   when different fragments get different QoS must be excluded
   __completely__, otherwise this QoS loses a sense.
   BTW, moving closer to the subject, only after this it should
   issue them in the same order, which they were received in. 8)8)8)

Alexey


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 19:06:41 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA27697
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 19:06:41 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id QAA06323
	for tcp-impl-outgoing; Fri, 16 Jun 2000 16:37:59 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id QAA06303
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 16:37:57 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id QAA11270; Fri, 16 Jun 2000 16:37:56 -0400 (EDT)
Received: from tnt.isi.edu(128.9.128.128) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma011246; Fri, 16 Jun 00 16:37:47 -0400
Received: from gra.isi.edu (gra.isi.edu [128.9.160.133])
	by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id NAA06829;
	Fri, 16 Jun 2000 13:37:46 -0700 (PDT)
From: Bob Braden <braden@ISI.EDU>
Received: (from braden@localhost)
	by gra.isi.edu (8.8.7/8.8.6) id UAA00552;
	Fri, 16 Jun 2000 20:37:45 GMT
Date: Fri, 16 Jun 2000 20:37:45 GMT
Message-Id: <200006162037.UAA00552@gra.isi.edu>
To: nseddigh@nortelnetworks.com, kuznet@ms2.inr.ac.ru
Subject: Re: Intentional Host Reordering of TCP Fragments
Cc: tcp-impl@grc.nasa.gov
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

  *> 
  *> I apologize, the word "reordering" applies to the cases
  *> when some order ever existed. 8) There is no order for fragments,
  *> but that one, which sending host generated.
  *> 

This statement is, at best, misleading.  Of course there is a natural
order for IP fragments, big-endian like all other network transmission!

It seems curious to me that an OS can create the fragments in reverse
order without having all the segment in a buffer at one time, since
applications certainly do not deliver bytes to the kernel in
reverse order.  And if they are all there at one time, you can
compute the checksum over it.  Oh, well.

The Linux approximation to a trailing checksum is only the latest
skirmish in the long-running war over whether TCP trailing checksums
would really help performance (assuming lack of brain damage in
hardware and software).  One of the early battles in that war was the
Berkeley trailer system (BSD was the Linux of the 1980s, remember?),
and community consensus eventually killed it as a Bad Idea (see RFC
1122 section 2.3.1).  The battle recurs every couple of years on some
large mailing list.  No one is every convinced, but the consensus
usually seems to emerge that TCP trailing checksums are not very
useful for performance.

  *> 
  *> 
  *> > BTW, if you have an algorithm that allows you to do
  *> > 100% accurate Layer-4 classification (without caching) 
  *> 
  *> 100% accurate algorithm, which classifies IP packets using keys
  *> contained in payload, does not exist, cannot exist and must not exist.
  *> It is in contradiction with basic principles of IP.

Yes. Protocols that need MF classification had better not do any IP
fragmentation.  EG the Int-Serv specs say that explicitly, and they
provide mechanisms to inform the sender of the E2E MTU so it can avoid
fragmentation.

None of this has much to do with TCP, so this discussion is on
the wrong list.

Bob Braden

  *> 
  *> If such pseudo-router really wants to do this, it must:
  *> 
  *> 1. Be sure that no other paths from the source to the destination exist.
  *> 2. Gather all the fragments before doing any decisions. The situation,
  *>    when different fragments get different QoS must be excluded
  *>    __completely__, otherwise this QoS loses a sense.
  *>    BTW, moving closer to the subject, only after this it should
  *>    issue them in the same order, which they were received in. 8)8)8)
  *> 
  *> Alexey
  *> 


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 19:38:53 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA28053
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 19:38:53 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id RAA11196
	for tcp-impl-outgoing; Fri, 16 Jun 2000 17:20:07 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id RAA11183
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 17:20:05 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id RAA17929; Fri, 16 Jun 2000 17:20:05 -0400 (EDT)
Message-Id: <200006162120.RAA17929@seraph3.lerc.nasa.gov>
Received: from be.be.com(208.243.144.2) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma017865; Fri, 16 Jun 00 17:19:58 -0400
Received: (qmail 8740 invoked from network); 16 Jun 2000 21:30:00 -0000
Received: from gpz.be.com (10.113.216.32)
  by mail.be.com with SMTP; 16 Jun 2000 21:30:00 -0000
To: tcp-impl@grc.nasa.gov
Subject: Re: Intentional Host Reordering of TCP Fragments
Date: Fri, 16 Jun 2000 14:24:43 PDT
From: "Howard Berkey" <howard@be.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Reply-To: howard@be.com
X-Mailer: BeOS Mail
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Doing what Linux does re: transmit order of the fragments is perfectly 
valid and makes sense.  It also has no bearing on the earlier 
discussion about out-of-order segments.  Recieving frags out of order 
in IP does not imply anything about the order segments are received by 
TCP.

Howard



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 19:59:42 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA28144
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 19:59:41 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id RAA12584
	for tcp-impl-outgoing; Fri, 16 Jun 2000 17:39:39 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id RAA12574
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 17:39:37 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id RAA19875; Fri, 16 Jun 2000 17:39:36 -0400 (EDT)
Received: from granger.mail.mindspring.net(207.69.200.148) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma019852; Fri, 16 Jun 00 17:39:07 -0400
Received: from wilson (vmlabs56.vmlabs.com [204.31.130.56])
	by granger.mail.mindspring.net (8.9.3/8.8.5) with SMTP id RAA07179
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 17:39:06 -0400 (EDT)
Reply-To: <wchung@ix.netcom.com>
From: "Wilson C. Chung" <wchung@ix.netcom.com>
To: <tcp-impl@grc.nasa.gov>
Subject: ISA used in TCP/IP/UDP
Date: Fri, 16 Jun 2000 14:41:25 -0700
Message-ID: <NDBBLECHILBKKKIOKHPKOEALCGAA.wchung@ix.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
In-Reply-To: <200006111635.JAA55096@elk.aciri.org>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

What are the typical instruction set extension on
a CPU so that implementing TCP/IP/UDP network stacks
would be more efficient?

Tx, Wilson Chung
VMLabs 


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 20:28:33 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA28451
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 20:28:32 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA14302
	for tcp-impl-outgoing; Fri, 16 Jun 2000 18:04:21 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id SAA14298
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 18:04:20 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id SAA22794; Fri, 16 Jun 2000 18:04:20 -0400 (EDT)
Received: from lightning.swansea.uk.linux.org(194.168.151.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma022674; Fri, 16 Jun 00 18:04:00 -0400
Received: from alan by the-village.bc.nu with local (Exim 2.12 #1)
	id 13345H-0006sz-00; Fri, 16 Jun 2000 22:55:27 +0100
Subject: Re: Intentional Host Reordering of TCP Fragments
To: braden@ISI.EDU (Bob Braden)
Date: Fri, 16 Jun 2000 22:55:25 +0100 (BST)
Cc: nseddigh@nortelnetworks.com, kuznet@ms2.inr.ac.ru, tcp-impl@grc.nasa.gov
In-Reply-To: <200006162037.UAA00552@gra.isi.edu> from "Bob Braden" at Jun 16, 2000 08:37:45 PM
X-Mailer: ELM [version 2.5 PL1]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <E13345H-0006sz-00@the-village.bc.nu>
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

> order without having all the segment in a buffer at one time, since
> applications certainly do not deliver bytes to the kernel in
> reverse order.  And if they are all there at one time, you can
> compute the checksum over it.  Oh, well.

So you want to do two passes over the data then send not one pass over the
data sending as you go.

L1 cache misses are expensive. I want to touch the data once since Im forced
to touch it.  With UDP it can be a big win because there are plenty of
cpus with small caches especially the embedded and performance sensitive ones.

We've been doing it for I think 4 or 5 years now

ALan



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 22:01:51 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA00952
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 22:01:51 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA18168
	for tcp-impl-outgoing; Fri, 16 Jun 2000 19:22:01 -0400 (EDT)
Received: from seraph2.lerc.nasa.gov (firewall-user@guardian02.lerc.nasa.gov [139.88.146.11])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id TAA18157
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 19:21:59 -0400 (EDT)
Received: by seraph2.lerc.nasa.gov; id TAA04858; Fri, 16 Jun 2000 19:21:58 -0400 (EDT)
Received: from tnt.isi.edu(128.9.128.128) by seraph2.lerc.nasa.gov via smap (V5.0)
	id xma004779; Fri, 16 Jun 00 19:21:15 -0400
Received: from gra.isi.edu (gra.isi.edu [128.9.160.133])
	by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id QAA00484;
	Fri, 16 Jun 2000 16:21:13 -0700 (PDT)
From: Bob Braden <braden@ISI.EDU>
Received: (from braden@localhost)
	by gra.isi.edu (8.8.7/8.8.6) id XAA00899;
	Fri, 16 Jun 2000 23:21:13 GMT
Date: Fri, 16 Jun 2000 23:21:13 GMT
Message-Id: <200006162321.XAA00899@gra.isi.edu>
To: braden@ISI.EDU, alan@lxorguk.ukuu.org.uk
Subject: Re: Intentional Host Reordering of TCP Fragments
Cc: nseddigh@nortelnetworks.com, kuznet@ms2.inr.ac.ru, tcp-impl@grc.nasa.gov
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


  *> From owner-tcp-impl@lerc.nasa.gov Fri Jun 16 15:39:17 2000
  *> Subject: Re: Intentional Host Reordering of TCP Fragments
  *> To: braden@ISI.EDU (Bob Braden)
  *> Date: Fri, 16 Jun 2000 22:55:25 +0100 (BST)
  *> Cc: nseddigh@nortelnetworks.com, kuznet@ms2.inr.ac.ru, tcp-impl@grc.nasa.gov
  *> In-Reply-To: <200006162037.UAA00552@gra.isi.edu> from "Bob Braden" at Jun 16, 2000 08:37:45 PM
  *> X-Mailer: ELM [version 2.5 PL1]
  *> MIME-Version: 1.0
  *> Content-Transfer-Encoding: 7bit
  *> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
  *> Sender: owner-tcp-impl@lerc.nasa.gov
  *> Precedence: bulk
  *> X-Lines: 16
  *> 
  *> > order without having all the segment in a buffer at one time, since
  *> > applications certainly do not deliver bytes to the kernel in
  *> > reverse order.  And if they are all there at one time, you can
  *> > compute the checksum over it.  Oh, well.
  *> 
  *> So you want to do two passes over the data then send not one pass over the
  *> data sending as you go.
  *> 

I don't think I said two passes, but this is not the time/place to
have this discussion (AGAIN!)

Bob


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 16 22:22:09 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA01046
	for <tcpimpl-archive@odin.ietf.org>; Fri, 16 Jun 2000 22:22:08 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA19419
	for tcp-impl-outgoing; Fri, 16 Jun 2000 19:50:51 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id TAA19408
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 19:50:50 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id TAA03748; Fri, 16 Jun 2000 19:50:50 -0400 (EDT)
Message-Id: <200006162350.TAA03748@seraph3.lerc.nasa.gov>
Received: from smtprch1.nortelnetworks.com(192.135.215.14) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma003725; Fri, 16 Jun 00 19:50:23 -0400
Received: from zcard00n.ca.nortel.com (actually zcard00n) 
          by smtprch1.nortel.com; Fri, 16 Jun 2000 18:48:46 -0500
Received: from zcard00f.ca.nortel.com ([47.129.30.8]) by zcard00n.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id M0V6TYV3; Fri, 16 Jun 2000 19:49:31 -0400
Received: from zcarh014 (zcarh014.ca.nortel.com [47.23.81.6]) 
          by zcard00f.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) 
          id L5HL844W; Fri, 16 Jun 2000 19:49:31 -0400
Date: Fri, 16 Jun 2000 19:49:08 -0400 (EDT)
X-Sybari-Space: 00000000 00000000 00000000
From: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
Reply-To: "Nabil Seddigh" <nseddigh@nortelnetworks.com>
Subject: Re: Intentional Host Reordering of TCP Fragments
To: kuznet@ms2.inr.ac.ru
cc: tcp-impl@grc.nasa.gov
X-Mailer: Rosa 2.1 HP-UXB.10.20
X-Rosa-Trace: nseddigh@zcarh014 <47.23.81.6>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-ID: <Rosa.HP-UXB.10..2.1.1000616194908.14319h@zcarh014>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by lombok-fi.lerc.nasa.gov id TAA19412
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 8bit

Alexey,

>
>I apologize, the word "reordering" applies to the cases
>when some order ever existed. 8) There is no order for fragments,
>but that one, which sending host generated.
>

Sorry. I should have said "reverse ordering" ;) It is not 
just 1 or 2 packets reordered but *all* pkts in a fragment
family with same ip_id. The pkt with the L4 information
is actually sent after all other pkts in that fragment family.

>
>100% accurate algorithm, which classifies IP packets using keys
>contained in payload, does not exist, cannot exist and must not exist.
>It is in contradiction with basic principles of IP.
>

Agreed! However, if you are doing the L4 classification 
at the network edge (eg. enterprise gateway) then the 
probability that the network causes pkt with fragment_offset 0 to 
arrive after other fragments is not that high. L4 Classification
schemes can accurately classify fragments in this case.

When the end-host intentionally reverse-orders fragments,
it puts the nail in the coffin for the classification scheme -
it ensures *ALL* fragments that need L4 classification will be 
incorrectly classified. 

Best,
---
Nabil Seddigh
nseddigh@nortelnetworks.com




From owner-tcp-impl@lerc.nasa.gov  Sat Jun 17 00:28:01 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id AAA03476
	for <tcpimpl-archive@odin.ietf.org>; Sat, 17 Jun 2000 00:28:00 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id VAA23898
	for tcp-impl-outgoing; Fri, 16 Jun 2000 21:43:23 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id VAA23893
	for <tcp-impl@grc.nasa.gov>; Fri, 16 Jun 2000 21:43:22 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id VAA13510; Fri, 16 Jun 2000 21:43:20 -0400 (EDT)
Received: from ren.netconnect.com.au(203.7.198.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma013473; Fri, 16 Jun 00 21:43:01 -0400
Received: (qmail 6028 invoked from network); 17 Jun 2000 01:43:35 -0000
Received: from unknown (HELO cvs.com.au) (203.87.14.203)
  by mail.netconnect.com.au with SMTP; 17 Jun 2000 01:43:35 -0000
Message-ID: <394A9920.3756850A@cvs.com.au>
Date: Sat, 17 Jun 2000 07:16:17 +1000
From: Charles Esson <charlese@cvs.com.au>
X-Mailer: Mozilla 4.5 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
CC: tcp-impl@grc.nasa.gov
Subject: Re: Intentional Host Reordering of TCP Fragments
References: <E132xsJ-00067X-00@the-village.bc.nu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

I can see your point; and have no difficulty seeing you saved a pass over the data
for all but the first fragment, quite clever really, but it would make life harder
for those that believe layering can be ignored completely when handling data that
has been put on the wire ( and I think those in that school are crazy).

I however are in  the enforce layering absolutely school; and I  used objects when
doing my stack. The tcp ( and all other protocols) checksums are done by calling
the protocol object, so the call has to be done after you have filled in the
destination and before you fragment.  The concept of fragmentation is unknown to
the protocol object and it definitely doesn't get involved on putting the data on
the wire.

Mind you; just as I believe defragmentation belongs in the NIC, I believe
fragmentation belongs there also. This is possible if you treat the destination as
something the protocol layer can get with a call to the routing layer ( which you
have to do anyway as the destination address at least has to remain stable even if
the destination has multiple NICs and the path changes), and you do the protocol
checksum calculation in the protocol layer.

I however can see this is not how most people see the world. If the NIC deals with
fragmentation and defragmentation. then there is no penalty if you use 64k ip
packets, in fact things go better as you only have to deal with one header for
every 64k in the main CPU. If there is no penalty why would you spend anytime
working out the optimum packet size for the initial path.

Regards

Alan Cox wrote:

> > During testing of Packet Classification capability
> > on for Diffserv-capable routers, one of my colleagues
> > discovered that the Linux TCP sender intentionally
> > reorders fragments - the pkt containing the TCP port
> > number is the last pkt sent in a family (same ip_id)
> > of fragments.
>
> We don't reorder fragments we send packets in an optimised fashion.
> Im suprised nobody else is doing this. Its common sense for any end host
> that has to support software tcp checksumming
>



From owner-tcp-impl@lerc.nasa.gov  Sat Jun 17 06:34:54 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA17471
	for <tcpimpl-archive@odin.ietf.org>; Sat, 17 Jun 2000 06:34:53 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id DAA09441
	for tcp-impl-outgoing; Sat, 17 Jun 2000 03:37:29 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id DAA09433
	for <tcp-impl@grc.nasa.gov>; Sat, 17 Jun 2000 03:37:28 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id DAA16153; Sat, 17 Jun 2000 03:37:27 -0400 (EDT)
Received: from ren.netconnect.com.au(203.7.198.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma016090; Sat, 17 Jun 00 03:36:52 -0400
Received: (qmail 6318 invoked from network); 17 Jun 2000 07:37:32 -0000
Received: from unknown (HELO cvs.com.au) (203.87.14.203)
  by mail.netconnect.com.au with SMTP; 17 Jun 2000 07:37:32 -0000
Message-ID: <394ACDB3.ECBEEB66@cvs.com.au>
Date: Sat, 17 Jun 2000 11:00:35 +1000
From: Charles Esson <charlese@cvs.com.au>
X-Mailer: Mozilla 4.5 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
To: tcp-impl@grc.nasa.gov
Subject: Re: Intentional Host Reordering of TCP Fragments
References: <E132xsJ-00067X-00@the-village.bc.nu> <394A9920.3756850A@cvs.com.au>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

This would make a lot more sense if I was talking about getting the source address
from the routing layer in a system that has multiple NICs.

Luckily  compilers convince me I was a fool years ago, so this is not a new discovery.

Charles Esson wrote:



> Mind you; just as I believe defragmentation belongs in the NIC, I believe
> fragmentation belongs there also. This is possible if you treat the destination as
> something the protocol layer can get with a call to the routing layer ( which you
> have to do anyway as the destination address at least has to remain stable even if
> the destination has multiple NICs and the path changes), and you do the protocol
> checksum calculation in the protocol layer.
> > that has to support software tcp checksumming
>



From owner-tcp-impl@lerc.nasa.gov  Mon Jun 19 21:36:03 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA26850
	for <tcpimpl-archive@odin.ietf.org>; Mon, 19 Jun 2000 21:36:03 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA16364
	for tcp-impl-outgoing; Mon, 19 Jun 2000 18:45:21 -0400 (EDT)
Received: from seraph2.lerc.nasa.gov (firewall-user@guardian02.lerc.nasa.gov [139.88.146.11])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id SAA16346
	for <tcp-impl@grc.nasa.gov>; Mon, 19 Jun 2000 18:45:19 -0400 (EDT)
Received: by seraph2.lerc.nasa.gov; id SAA24994; Mon, 19 Jun 2000 18:45:19 -0400 (EDT)
Received: from aland.bbn.com(204.162.9.10) by seraph2.lerc.nasa.gov via smap (V5.0)
	id xma024827; Mon, 19 Jun 00 18:44:41 -0400
Received: from [128.33.238.65] (TC065.BBN.COM [128.33.238.65])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id SAA33173;
	Mon, 19 Jun 2000 18:44:37 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Mime-Version: 1.0
X-Sender: craigpop@aland.bbn.com
Message-Id: <p04310101b574511b59b8@[204.162.9.11]>
In-Reply-To: <NDBBLECHILBKKKIOKHPKOEALCGAA.wchung@ix.netcom.com>
References: <NDBBLECHILBKKKIOKHPKOEALCGAA.wchung@ix.netcom.com>
Date: Mon, 19 Jun 2000 18:42:29 -0400
To: wchung@ix.netcom.com, tcp-impl@grc.nasa.gov
From: Craig Partridge <craig@aland.bbn.com>
Subject: Re: ISA used in TCP/IP/UDP
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

At 2:41 PM -0700 6/16/00, Wilson C. Chung wrote:
>What are the typical instruction set extension on
>a CPU so that implementing TCP/IP/UDP network stacks
>would be more efficient?

That's a hard question to answer, since a lot depends on what the instruction
set already offers and whether your focus is on TCP/UDP/IP (i.e. a host) or
just IP (e.g. a router).

Here are some general comments about instruction sets on a host system using
TCP/UDP/IP:

* the instruction set should have good support for 8, 16, and 32 bit
    field manipulations in registers (e.g., get a value out of a particular
    offset, update it, put the value back in at that offset)

* a carry bit is *very* valuable

* have enough registers that, after the registers used by the compiler for
   stacks, return locations, etc, you can put all 40 bytes of the TCP and
   IP header into registers and still have 3 or 4 registers left over...

These a broad guidelines -- your architecture will vary.

Craig


From owner-tcp-impl@lerc.nasa.gov  Mon Jun 19 23:23:16 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id XAA29677
	for <tcpimpl-archive@odin.ietf.org>; Mon, 19 Jun 2000 23:23:16 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id VAA23845
	for tcp-impl-outgoing; Mon, 19 Jun 2000 21:09:22 -0400 (EDT)
Received: from seraph2.lerc.nasa.gov (firewall-user@guardian02.lerc.nasa.gov [139.88.146.11])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id VAA23826
	for <tcp-impl@grc.nasa.gov>; Mon, 19 Jun 2000 21:09:20 -0400 (EDT)
Received: by seraph2.lerc.nasa.gov; id VAA26079; Mon, 19 Jun 2000 21:09:20 -0400 (EDT)
Received: from postal.redback.com(155.53.12.9) by seraph2.lerc.nasa.gov via smap (V5.0)
	id xma025937; Mon, 19 Jun 00 21:08:49 -0400
Received: from green.redback.com (green.redback.com [155.53.36.109])
	by postal.redback.com (Postfix) with ESMTP
	id EF1632AA09; Mon, 19 Jun 2000 18:08:43 -0700 (PDT)
Received: from green.redback.com by green.redback.com (8.9.3) id SAA28358; Mon, 19 Jun 2000 18:06:58 -0700 (PDT)
Message-Id: <200006200106.SAA28358@green.redback.com>
X-Mailer: exmh version 2.1.0 09/18/1999
To: wchung@ix.netcom.com
Cc: tcp-impl@grc.nasa.gov
Subject: Re: ISA used in TCP/IP/UDP 
In-Reply-To: Your message of "Fri, 16 Jun 2000 14:41:25 PDT."
             <NDBBLECHILBKKKIOKHPKOEALCGAA.wchung@ix.netcom.com> 
Mime-Version: 1.0
Content-Type: multipart/signed; boundary="==_Exmh_-866565274P";
	 micalg=pgp-sha1; protocol="application/pgp-signature"
Content-Transfer-Encoding: 7bit
Date: Mon, 19 Jun 2000 18:06:58 -0700
From: Greg Minshall <minshall@redback.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

--==_Exmh_-866565274P
Content-Type: text/plain; charset=us-ascii

> What are the typical instruction set extension on
> a CPU so that implementing TCP/IP/UDP network stacks
> would be more efficient?

add-with-carry is a nice instruction; it makes it somewhat faster to do IP 
checksums...


--==_Exmh_-866565274P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: owCZtBGr1SAGm57WgBCuKzKs0IPp4cpX

iQA/AwUBOU7DsW1GBZxTyU5lEQJBzACeOzwwHktqBfxkZ7eLuqCfFisgWGQAnRNl
TOcHsUk6xQ6BpJXMbmpboJOA
=Gj0r
-----END PGP SIGNATURE-----

--==_Exmh_-866565274P--


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 03:47:53 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA14884
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 03:47:52 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id BAA06174
	for tcp-impl-outgoing; Tue, 20 Jun 2000 01:23:33 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id BAA06165
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 01:23:32 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id BAA04974; Tue, 20 Jun 2000 01:23:30 -0400 (EDT)
Received: from prv-mail20.provo.novell.com(137.65.81.122) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma004921; Tue, 20 Jun 00 01:22:47 -0400
Received: from INET-PRV-Message_Server by prv-mail20.provo.novell.com
	with Novell_GroupWise; Mon, 19 Jun 2000 23:22:40 -0600
Message-Id: <s94eab40.043@prv-mail20.provo.novell.com>
X-Mailer: Novell GroupWise Internet Agent 5.5.3.1
Date: Mon, 19 Jun 2000 23:22:18 -0600
From: "Narsimharao Nagampalli" <NNARASIMHARAO@novell.com>
To: <tcp-impl@grc.nasa.gov>, <wchung@ix.netcom.com>, <minshall@redback.com>
Subject: Re: ISA used in TCP/IP/UDP
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by lombok-fi.lerc.nasa.gov id BAA06168
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 8bit

Also other areas to look into would be encryption for IPSec and Compression. If we have hardware support for encryption, it would be good.

-Narsi



From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 11:26:25 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA25924
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 11:26:25 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id IAA04864
	for tcp-impl-outgoing; Tue, 20 Jun 2000 08:42:26 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id IAA04826
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 08:42:23 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id IAA21110; Tue, 20 Jun 2000 08:42:21 -0400 (EDT)
Received: from linux.klos.com(192.80.49.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma021047; Tue, 20 Jun 00 08:41:51 -0400
Received: (from patrick@localhost)
	by klos.com (8.9.3/8.9.3) id IAA06761;
	Tue, 20 Jun 2000 08:41:36 -0400
Date: Tue, 20 Jun 2000 08:41:36 -0400
From: Patrick Klos <patrick@klos.com>
Message-Id: <200006201241.IAA06761@klos.com>
To: minshall@redback.com, wchung@ix.netcom.com
Subject: Re: ISA used in TCP/IP/UDP
Cc: tcp-impl@grc.nasa.gov
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

>> What are the typical instruction set extension on
>> a CPU so that implementing TCP/IP/UDP network stacks
>> would be more efficient?
>
>add-with-carry is a nice instruction; it makes it somewhat faster to do IP 
>checksums...

I got the impression the original poster was asking for real extensions - 
not items that are already available on most processors.  Maybe I'm wrong.

If I were building a processor with TCP/UDP/IP specific extensions, I'd
probably add some of the following:

1)  Checksum register/instructions:

	Compute standard checksum on a range of bytes (useful to
	check IP header checksum).

	Copy and compute standard checksum (useful when collecting
	pieces of a packet before sending it).

	Initialize checksum (used to extract pseudo-header and compute
	checksum on it).

    Checksum instructions would be supported by several registers:

	Accumulator (the running checksum total)

	Checksum offset (where in the "packet" to ignore the data since
	the real checksum is stored there).

2)  Keep in mind similar extensions to support the possibility of
    IPv6.

3)  CRC instructions:

	Set 16 or 32 bit polynomial.

4)  Hash instructions?  (useful for routers)

5)  Built-in "mbuf"s?  I.e. the processor understands a specific scheme
    of chaining buffers to create packets, thus can more efficiently
    poke around in a packet for you, rather then your code having to 
    always do it by hand.

Just a few ideas...
============================================================================
    Patrick Klos                           Email: patrick@klos.com
    Klos Technologies, Inc.                Web:   http://www.klos.com/
============================================================================


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 11:26:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA25935
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 11:26:28 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id IAA07379
	for tcp-impl-outgoing; Tue, 20 Jun 2000 08:58:20 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id IAA07311;
	Tue, 20 Jun 2000 08:58:07 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id IAA23450; Tue, 20 Jun 2000 08:58:07 -0400 (EDT)
Received: from ada.cs.ucy.ac.cy(194.42.10.200) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma023399; Tue, 20 Jun 00 08:57:57 -0400
Received: from cs144 (cs144.cs.ucy.ac.cy [194.42.7.56])
	by ada.cs.ucy.ac.cy (8.8.8/8.8.8) with SMTP id QAA34954;
	Tue, 20 Jun 2000 16:04:45 +0300
Message-ID: <0d0501bfdab8$bf382a70$38072ac2@cs.ucy.ac.cy>
From: "Andreas Pitsillides" <andreas.pitsillides@ucy.ac.cy>
To: <xtp-relay@cs.concordia.ca>, <webrepl@cs.utk.edu>,
        "\"terena list\"" <ga@terena.nl>, <tcpsat@lerc.nasa.gov>,
        <tcp-impl@lerc.nasa.gov>, <tci-announce@computer.org>, <tccc@ieee.org>,
        <reres@laas.fr>, <request-datacom@comsoc.org>,
        <performance@haven.epm.ornl.gov>, <itc@ieee.org>,
        <iscc2000@infres.enst.fr>, <ieeetcpc@listserv.utoronto.ca>,
        <ieee_rtc_list@cs.tamu.edu>, "\"GU-NET\"" <gu-net@gunet.gr>,
        "\"Globecom\"" <confs-globecom@comsoc.org>,
        <f-troup@CODEX.CIS.upenn.edu>, <fokus-user@fokus.gmd.de>,
        <ctc-members@tinac.com>, <Cost264@lip6.fr>,
        <cost257@informatik.uni-wuerzburg.de>,
        <cost237-transport@comp.lancs.ac.uk>,
        "\"Conferencesa\"" <confs-conferencesa@comsoc.org>,
        <comswtc@comsoc.org>, "\"comm-theory\"" <comm-theory@ieee.org>,
        "\"cnom\"" <cnom@maestro.bellcore.com>,
        "\"alg\"" <alg@comm.toronto.edu>, <iwqos@comsoc.org>
Cc: "Andreas Pitsillides" <Andreas.Pitsillides@ucy.ac.cy>
Subject: IEEE Infocom 2001 ---> 13 DAYS to GO ---- HAVE YOU SUBMITTED YOUR PAPER ?  
Date: Tue, 20 Jun 2000 16:08:15 +0300
MIME-Version: 1.0
Content-Type: text/plain;
	charset="windows-1253"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6600
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

Please accept our apologies for multiple copies.

HAVE YOU SUBMITTED YOUR PAPER ?- 13 DAYS to GO

----------------------------------<>-----------------------------------
The 20th Annual Conference of IEEE Communications and Computer Societies
                      C A L L   F O R    P A P E R S
                   I E E E   I N F O C O M    2 0 0 1


               The Conference on Computer Communications
                "20 Years into the Communications Odyssey"

                     http://www.ieee-infocom.org/2001

                  April 22-26, 2001 - Anchorage, Alaska
        Sponsored by the IEEE Communications and Computer Societies

 CALL FOR PAPERS
 ================

 The major conference on computer communications and networking is
 celebrating its 20th anniversary in the splendid setting of Anchorage
 (Alaska) during the week of April 22-26. The conference will bring
 together researchers and practitioners of every aspect of digital
 communications and networks, presenting the most up-to-date results
 and achievements in these fields. The IEEE INFOCOM 2001 program committee
 is soliciting original papers describing state-of-the-art research and
 development in all areas of computer networking and data
 communications. Topics of interest include, but are not limited to,
 the following:


 BISDN and ATM                       Network management and control
 Billing and pricing                 Network measurements and testbeds
 Communication protocols             Protocol design and analysis
 Congestion and admission control    Quality of service
 Flow control                        Queueing theory
 Cryptography, information hiding    Scheduling
 Internet and web applications       Security and privacy
 Optical networks                    Storage area networks
 Mobile networks                     Switching and switch architectures
 Multicast                           Traffic management and control
 Multimedia                          Routing
 Multiple access                     Web performance and caching
 Network architectures               Wireless networks


 PAPER SUBMISSION
 ================

 Papers must be submitted electronically according to the instructions
 described in <http://www.ieee-infocom.org/2001> and summarized
 below. Proposals for panels, half- or full-day tutorials should be
 submitted to the respective chairs. Please refer to the conference web
 site for further details.

 Papers must be formatted according to the IEEE standard format except
 for the font size, which MUST be 11pt.  To make it easy to adhere to
 the formatting standard, we offer templates and samples for LaTex,
 MSWord, and FrameMaker (please refer to the pertinent web pages at
<http://www.ieee-infocom.org/2001>).
 -------------------------------------------------------------------
 PAPERS THAT DO NOT COMPLY TO THE ABOVE FORMAT CANNOT BE REVIEWED
 -------------------------------------------------------------------

 Submissions must be in PDF or Postscript.  Postscript papers must use
 only standard PostScript fonts: Times Roman, Courier, Symbol, and
 Helvetica.  (Please note that Postscript output from MSWord typically
 does not work on non-Microsoft platforms.  The use of the Apple
 LaserWriter II printer driver is strongly recommended).  The above
 formatted papers can be submitted in a compressed form (gzip, zip,
 WinZip, compress).

 Because of the size limitation on the final manuscript, and to ensure
 that the reviewed paper and the final version have a similar size,
 -----------------------------------------------------
 PAPERS WITH MORE THAN 11 PAGES CANNOT BE REVIEWED
 -----------------------------------------------------
 (this is roughly equivalent to 20 double-spaced pages).

 Papers must be submitted electronically using the Web site at
 <http://www.ieee-infocom.org/2001>.  This web page contains exact and
 detailed instructions about the submission process. Author's contact
 information must be provided during submission. To save space, authors
 may omit this information from the paper itself.  Authors will receive
 an immediate notification of the successful receipt of the file
 containing their paper.  Subsequently, a formal notification will be
 sent after verifying that the paper can be printed successfully.

 -------------------------------------------------------------------------
| SUBMISSIONS WILL ONLY BE ACCEPTED BETWEEN MAY 1ST AND JULY 5TH, 000. |
 ------------------------------------------------------------------------->
 SUBMISSION DEADLINES ARE STRICT!  PAPERS THAT HAVE BEEN IMPROPERLY
 SUBMITTED OR IMPROPERLY FORMATTED BY THE SUBMISSION DEADLINE WILL NOT BE
CONSIDERED.  TO AVOID LAST MINUTE PROBLEMS, AUTHORS ARE ENCOURAGED TO SUBMIT
THEIR PAPERS WELL IN ADVANCE OF THE DEADLINE.


 THE REVIEW PROCESS
 ==================

 Each paper will typically be reviewed by three independent reviewers,
 whose reviews will be relayed to the corresponding author.  Following
 last year successful experiment, authors will have a chance to provide
 a limited rebuttal on the reviews before the program committee makes
 its final decision.


 TRAVEL GRANTS
 =============

 Limited travel assistance to students, post-docs and junior faculty
 presenting a paper in the conference will be available. Please refer
 to the conference web site for further details.


 IMPORTANT DATES
 ===============

    Complete paper due             July 5, 2000
    Notification of acceptance     October 31, 2000
    Final version due              December 31, 2000


 PROGRAM COMMITTEE CO-CHAIRS [infocom@watson.ibm.com]
 ===========================

    Rene L. Cruz, UCSD
    Giovanni Pacifici, IBM Research





From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 14:45:25 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA01475
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 14:45:25 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA09883
	for tcp-impl-outgoing; Tue, 20 Jun 2000 12:16:24 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA09863
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 12:16:22 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id MAA26011; Tue, 20 Jun 2000 12:16:21 -0400 (EDT)
Received: from boreas.isi.edu(128.9.160.161) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma025939; Tue, 20 Jun 00 12:15:46 -0400
Received: from isi.edu (sci.isi.edu [128.9.160.93])
	by boreas.isi.edu (8.9.3/8.9.3) with ESMTP id JAA12858;
	Tue, 20 Jun 2000 09:15:35 -0700 (PDT)
Message-ID: <394F98A2.105E9FF7@isi.edu>
Date: Tue, 20 Jun 2000 09:15:30 -0700
From: Joe Touch <touch@ISI.EDU>
X-Mailer: Mozilla 4.73 [en] (Win98; U)
X-Accept-Language: en,pdf
MIME-Version: 1.0
To: Patrick Klos <patrick@klos.com>
CC: minshall@redback.com, wchung@ix.netcom.com, tcp-impl@grc.nasa.gov
Subject: Re: ISA used in TCP/IP/UDP
References: <200006201241.IAA06761@klos.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit



Patrick Klos wrote:
> 
> >> What are the typical instruction set extension on
> >> a CPU so that implementing TCP/IP/UDP network stacks
> >> would be more efficient?
> >
> >add-with-carry is a nice instruction; it makes it somewhat faster to do IP
> >checksums...
> 
> I got the impression the original poster was asking for real extensions -
> not items that are already available on most processors.  Maybe I'm wrong.
> 
> If I were building a processor with TCP/UDP/IP specific extensions, I'd
> probably add some of the following:
> 
> 1)  Checksum register/instructions:
> 
>         Compute standard checksum on a range of bytes (useful to
>         check IP header checksum).
> 
>         Copy and compute standard checksum (useful when collecting
>         pieces of a packet before sending it).

Better if done by a DMA.

>         Initialize checksum (used to extract pseudo-header and compute
>         checksum on it).
> 
>     Checksum instructions would be supported by several registers:
> 
>         Accumulator (the running checksum total)
>
>         Checksum offset (where in the "packet" to ignore the data since
>         the real checksum is stored there).

A simpler solution would be "ones complement add" - a little more than
add with carry (i.e., there's no carry, it just gets folded in).
However, all that would save is one instruction each time the checksum
is read, since intermediate carries can pipelined in.
 
> 2)  Keep in mind similar extensions to support the possibility of
>     IPv6.

Or lack of similar extensions. I.e., in IPv6, checksum of a range of
bytes becomes much less useful for header manipulations; checksum of
large quantities of data should be combined with DMA where possible.

> 5)  Built-in "mbuf"s?  I.e. the processor understands a specific scheme
>     of chaining buffers to create packets, thus can more efficiently
>     poke around in a packet for you, rather then your code having to
>     always do it by hand.

Assumes mbufs. There are versions of TCP that don't rely on that
structure, and the structure and especially the order of its components
can be OS and even compiler-specific. 

Joe


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 14:45:57 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA01489
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 14:45:57 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA11597
	for tcp-impl-outgoing; Tue, 20 Jun 2000 12:28:23 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id MAA11567
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 12:28:21 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id MAA28056; Tue, 20 Jun 2000 12:28:20 -0400 (EDT)
Received: from linux.klos.com(192.80.49.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma028019; Tue, 20 Jun 00 12:28:06 -0400
Received: (from patrick@localhost)
	by klos.com (8.9.3/8.9.3) id MAA08269;
	Tue, 20 Jun 2000 12:28:04 -0400
Date: Tue, 20 Jun 2000 12:28:04 -0400
From: Patrick Klos <patrick@klos.com>
Message-Id: <200006201628.MAA08269@klos.com>
To: patrick@klos.com, touch@ISI.EDU
Subject: Re: ISA used in TCP/IP/UDP
Cc: minshall@redback.com, tcp-impl@grc.nasa.gov, wchung@ix.netcom.com
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

>> >> What are the typical instruction set extension on
>> >> a CPU so that implementing TCP/IP/UDP network stacks
>> >> would be more efficient?
>> >
>> >add-with-carry is a nice instruction; it makes it somewhat faster to do IP
>> >checksums...
>> 
>> I got the impression the original poster was asking for real extensions -
>> not items that are already available on most processors.  Maybe I'm wrong.
>> 
>> If I were building a processor with TCP/UDP/IP specific extensions, I'd
>> probably add some of the following:
>> 
>> 1)  Checksum register/instructions:
>> 
>>         Compute standard checksum on a range of bytes (useful to
>>         check IP header checksum).
>> 
>>         Copy and compute standard checksum (useful when collecting
>>         pieces of a packet before sending it).
>
>Better if done by a DMA.

Sure, if you have DMA.  A DMA controller that could compute a running
checksum would be cool.

>>         Initialize checksum (used to extract pseudo-header and compute
>>         checksum on it).
>> 
>>     Checksum instructions would be supported by several registers:
>> 
>>         Accumulator (the running checksum total)
>>
>>         Checksum offset (where in the "packet" to ignore the data since
>>         the real checksum is stored there).
>
>A simpler solution would be "ones complement add" - a little more than
>add with carry (i.e., there's no carry, it just gets folded in).
>However, all that would save is one instruction each time the checksum
>is read, since intermediate carries can pipelined in.

Ones complement add is already very easy to do with existing processors.
I'm talking about instructions that automatically add to a checksum as
a packet's contents are being copied (since they very often are unless
you have scatter-gather hardware to collect up the pieces of a packet,
in which case your other suggestion would work out).
 
>> 2)  Keep in mind similar extensions to support the possibility of
>>     IPv6.
>
>Or lack of similar extensions. I.e., in IPv6, checksum of a range of
>bytes becomes much less useful for header manipulations; checksum of
>large quantities of data should be combined with DMA where possible.
>
>> 5)  Built-in "mbuf"s?  I.e. the processor understands a specific scheme
>>     of chaining buffers to create packets, thus can more efficiently
>>     poke around in a packet for you, rather then your code having to
>>     always do it by hand.
>
>Assumes mbufs. There are versions of TCP that don't rely on that
>structure, and the structure and especially the order of its components
>can be OS and even compiler-specific. 

Yes, it assumes mbufs.  It also assumes that a TCP stack for this special
processor would be optimized for this processor.  Most TCP stacks have
some notion similar to mbufs to help minimize copying packet contents
all over the place.  My suggestion is just to capitalize on a version of
that notion to speed up access to packet contents when necessary.
============================================================================
    Patrick Klos                           Email: patrick@klos.com
    Klos Technologies, Inc.                Web:   http://www.klos.com/
============================================================================


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 15:34:18 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA02347
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 15:34:17 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id NAA18244
	for tcp-impl-outgoing; Tue, 20 Jun 2000 13:09:44 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id NAA18210
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 13:09:41 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id NAA05699; Tue, 20 Jun 2000 13:09:40 -0400 (EDT)
Received: from boreas.isi.edu(128.9.160.161) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma005681; Tue, 20 Jun 00 13:09:38 -0400
Received: from isi.edu (sci.isi.edu [128.9.160.93])
	by boreas.isi.edu (8.9.3/8.9.3) with ESMTP id KAA19191;
	Tue, 20 Jun 2000 10:09:31 -0700 (PDT)
Message-ID: <394FA547.F2275E89@isi.edu>
Date: Tue, 20 Jun 2000 10:09:27 -0700
From: Joe Touch <touch@ISI.EDU>
X-Mailer: Mozilla 4.73 [en] (Win98; U)
X-Accept-Language: en,pdf
MIME-Version: 1.0
To: Patrick Klos <patrick@klos.com>
CC: minshall@redback.com, tcp-impl@grc.nasa.gov, wchung@ix.netcom.com
Subject: Re: ISA used in TCP/IP/UDP
References: <200006201628.MAA08269@klos.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit



Patrick Klos wrote:
> 
> >> >> What are the typical instruction set extension on
> >> >> a CPU so that implementing TCP/IP/UDP network stacks
> >> >> would be more efficient?
> >> >
> >> >add-with-carry is a nice instruction; it makes it somewhat faster to do IP
> >> >checksums...
> >>
> >> I got the impression the original poster was asking for real extensions -
> >> not items that are already available on most processors.  Maybe I'm wrong.
> >>
> >> If I were building a processor with TCP/UDP/IP specific extensions, I'd
> >> probably add some of the following:
> >>
> >> 1)  Checksum register/instructions:
> >>
> >>         Compute standard checksum on a range of bytes (useful to
> >>         check IP header checksum).
> >>
> >>         Copy and compute standard checksum (useful when collecting
> >>         pieces of a packet before sending it).
> >
> >Better if done by a DMA.
> 
> Sure, if you have DMA.  A DMA controller that could compute a running
> checksum would be cool.

We implemented one here at ISI in 1996, and it was based on emulating in
multiple chips what other DMA designers had already done. Not so much
cool, as cold.


> Ones complement add is already very easy to do with existing processors.
> I'm talking about instructions that automatically add to a checksum as
> a packet's contents are being copied (since they very often are unless
> you have scatter-gather hardware to collect up the pieces of a packet,
> in which case your other suggestion would work out).

Given typical processor pipelines, the arithmetic 'carry-add' is masked
by the bubble generated by the data move anyway.

> >> 5)  Built-in "mbuf"s?  I.e. the processor understands a specific scheme
> >>     of chaining buffers to create packets, thus can more efficiently
> >>     poke around in a packet for you, rather then your code having to
> >>     always do it by hand.
> >
> >Assumes mbufs. There are versions of TCP that don't rely on that
> >structure, and the structure and especially the order of its components
> >can be OS and even compiler-specific.
> 
> Yes, it assumes mbufs.  It also assumes that a TCP stack for this special
> processor would be optimized for this processor.  Most TCP stacks have
> some notion similar to mbufs to help minimize copying packet contents
> all over the place.  My suggestion is just to capitalize on a version of
> that notion to speed up access to packet contents when necessary.

Lisp processors tried this before, and it works. You can build deep data
structure knowledge into the processor if you want. While this is useful
for the scatter-gather DMA, why is it useful for a processor to be able
to do?

(and if I already have a scatter-gather DMA, and I already optimize my
kernel data structure to use the DMA's notation, how much do I win by
doing this too?)

Joe


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 16:10:47 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA03054
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 16:10:46 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id NAA22699
	for tcp-impl-outgoing; Tue, 20 Jun 2000 13:36:43 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id NAA22686
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 13:36:42 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id NAA10304; Tue, 20 Jun 2000 13:36:41 -0400 (EDT)
Received: from linux.klos.com(192.80.49.1) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma010268; Tue, 20 Jun 00 13:36:22 -0400
Received: (from patrick@localhost)
	by klos.com (8.9.3/8.9.3) id NAA08686;
	Tue, 20 Jun 2000 13:36:21 -0400
Date: Tue, 20 Jun 2000 13:36:21 -0400
From: Patrick Klos <patrick@klos.com>
Message-Id: <200006201736.NAA08686@klos.com>
To: tcp-impl@grc.nasa.gov, touch@ISI.EDU
Subject: Re: ISA used in TCP/IP/UDP
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

>> Sure, if you have DMA.  A DMA controller that could compute a running
>> checksum would be cool.
>
>We implemented one here at ISI in 1996, and it was based on emulating in
>multiple chips what other DMA designers had already done. Not so much
>cool, as cold.

Well, I expect (most of) the rest of the world hasn't seen this 
implementation.  Nor am I aware of any processor (i.e. single CPU) that 
has integrated such a feature into it's core.

>> Ones complement add is already very easy to do with existing processors.
>> I'm talking about instructions that automatically add to a checksum as
>> a packet's contents are being copied (since they very often are unless
>> you have scatter-gather hardware to collect up the pieces of a packet,
>> in which case your other suggestion would work out).
>
>Given typical processor pipelines, the arithmetic 'carry-add' is masked
>by the bubble generated by the data move anyway.

You appear to be missing the point - the original question asked was "what
instructions could be added to a processor to make coding TCP/IP stacks
more efficient" (if I paraphrased it correctly?).  Maybe this person 
wants to build a new lightweight processor with similar efficiency to
today's complex processors, and target network devices specifically?

>> >> 5)  Built-in "mbuf"s?  I.e. the processor understands a specific scheme
>> >>     of chaining buffers to create packets, thus can more efficiently
>> >>     poke around in a packet for you, rather then your code having to
>> >>     always do it by hand.
>> >
>> >Assumes mbufs. There are versions of TCP that don't rely on that
>> >structure, and the structure and especially the order of its components
>> >can be OS and even compiler-specific.
>> 
>> Yes, it assumes mbufs.  It also assumes that a TCP stack for this special
>> processor would be optimized for this processor.  Most TCP stacks have
>> some notion similar to mbufs to help minimize copying packet contents
>> all over the place.  My suggestion is just to capitalize on a version of
>> that notion to speed up access to packet contents when necessary.
>
>Lisp processors tried this before, and it works. You can build deep data
>structure knowledge into the processor if you want. While this is useful
>for the scatter-gather DMA, why is it useful for a processor to be able
>to do?

It could make it easier to have code that says "give me bytes 20 and 21
of this packet" without the code having to walk the mbufs itself.  It
would make it easier to extract data from the packet arbitrarily.

You make it sound as if you don't think there are any possible enhancements
that could be made to a processor to optimize TCP/IP stack implementation?
Maybe the processors you use are not practical for the person who started
this thread??
============================================================================
    Patrick Klos                           Email: patrick@klos.com
    Klos Technologies, Inc.                Web:   http://www.klos.com/
============================================================================


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 16:36:49 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA03578
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 16:36:49 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA01468
	for tcp-impl-outgoing; Tue, 20 Jun 2000 14:29:17 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id OAA01436
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 14:29:15 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id OAA19168; Tue, 20 Jun 2000 14:29:12 -0400 (EDT)
Received: from calcite.rhyolite.com(38.159.140.3) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma019112; Tue, 20 Jun 00 14:28:56 -0400
Received: (from vjs@localhost)
	by calcite.rhyolite.com (8.9.3/calcite) id MAA17203
	for tcp-impl@grc.nasa.gov  env-from <vjs>;
	Tue, 20 Jun 2000 12:28:54 -0600 (MDT)
Date: Tue, 20 Jun 2000 12:28:54 -0600 (MDT)
From: Vernon Schryver <vjs@calcite.rhyolite.com>
Message-Id: <200006201828.MAA17203@calcite.rhyolite.com>
To: tcp-impl@grc.nasa.gov
Subject: Re: ISA used in TCP/IP/UDP
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> From: Joe Touch <touch@ISI.EDU>

> ...
> >     Checksum instructions would be supported by several registers:
> > 
> >         Accumulator (the running checksum total)
> >
> >         Checksum offset (where in the "packet" to ignore the data since
> >         the real checksum is stored there).
>
> A simpler solution would be "ones complement add" - a little more than
> add with carry (i.e., there's no carry, it just gets folded in).
> However, all that would save is one instruction each time the checksum
> is read, since intermediate carries can pipelined in.

I don't see the utility of a 1's complement add, if only because its
necessarily fixed width would likely be wrong in many situations.
(yes, the checksum itself is 16 bits wide, but doesn't everyone compute
it in bulk at least 32 bits wide?)

My prejudice is also to checksum with fly-by-DMA.  However, with today's
infinitely fast CPU's (compared to memory) it's not clear what DMA means
or that making your DMA machinery uniquely fancy is the right choice. 
If your DMA engine is some kind of CPU or DSP core, you would want
instructions that don't make the TCP checksum painful, but that's no
more than a carry bit.

If you don't have a carry bit, as is the case in some major RISC
instruction sets, computing the TCP checksum requires twice as many
CPU instructions as it would otherwise for a given width.


> > 5)  Built-in "mbuf"s?  I.e. the processor understands a specific scheme
> >     of chaining buffers to create packets, thus can more efficiently
> >     poke around in a packet for you, rather then your code having to
> >     always do it by hand.
>
> Assumes mbufs. There are versions of TCP that don't rely on that
> structure, and the structure and especially the order of its components
> can be OS and even compiler-specific. 

Some of us have experienced such joys in a single release of a single
operating system (or at least single source tree) built with a single
(cross) compiler.  As 64-bit architectures finally become common and then
when less simplistic choices than long=int=pointer=64-bits are common,
many of us might have such joys.

Mbufs themselves are not hard.  All of the variations I've seen have been
designed to work well with existing simple instructions.
That's why for the classic flavors, there are C macros that expand to
very few instructions for the common operations.
The expensive parts of dealing with mbufs could not be put into reasonable
instructions.  For example, you really don't want single instructions to
fiddle with virtual memory machinery for mbufs.

Special instructions to understand mbufs sound too much like classic CISC
mistakes.  (One of my favorite CISC instructions was the CDC 3000's evaluate-
polynomical.)  It would be nice to think that the lessons learned during
the RISC revolution about gratuitous, because-it-might-be-nice instruction
set geegaws have not been forgotten.
Actually, that sour observation applies to the original question.


Vernon Schryver    vjs@rhyolite.com


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 17:35:02 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA04936
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 17:35:01 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA06881
	for tcp-impl-outgoing; Tue, 20 Jun 2000 14:58:36 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with ESMTP id OAA06866
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 14:58:34 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id OAA24831; Tue, 20 Jun 2000 14:58:32 -0400 (EDT)
Received: from boreas.isi.edu(128.9.160.161) by seraph3.lerc.nasa.gov via smap (V5.0)
	id xma024703; Tue, 20 Jun 00 14:57:48 -0400
Received: from isi.edu (sci.isi.edu [128.9.160.93])
	by boreas.isi.edu (8.9.3/8.9.3) with ESMTP id LAA02221;
	Tue, 20 Jun 2000 11:57:44 -0700 (PDT)
Message-ID: <394FBEA4.20F9AE41@isi.edu>
Date: Tue, 20 Jun 2000 11:57:40 -0700
From: Joe Touch <touch@ISI.EDU>
X-Mailer: Mozilla 4.73 [en] (Win98; U)
X-Accept-Language: en,pdf
MIME-Version: 1.0
To: Patrick Klos <patrick@klos.com>
CC: tcp-impl@grc.nasa.gov, touch@ISI.EDU
Subject: Re: ISA used in TCP/IP/UDP
References: <200006201736.NAA08686@klos.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit



Patrick Klos wrote:
> 
> >> Sure, if you have DMA.  A DMA controller that could compute a running
> >> checksum would be cool.
> >
> >We implemented one here at ISI in 1996, and it was based on emulating in
> >multiple chips what other DMA designers had already done. Not so much
> >cool, as cold.
> 
> Well, I expect (most of) the rest of the world hasn't seen this
> implementation.  Nor am I aware of any processor (i.e. single CPU) that
> has integrated such a feature into it's core.

DMAs have it, typically ones developed for NICs. If you're DMAing data
more than once, you have other places to optimize.

> >Given typical processor pipelines, the arithmetic 'carry-add' is masked
> >by the bubble generated by the data move anyway.
> 
> You appear to be missing the point - the original question asked was "what
> instructions could be added to a processor to make coding TCP/IP stacks
> more efficient" (if I paraphrased it correctly?).  Maybe this person
> wants to build a new lightweight processor with similar efficiency to
> today's complex processors, and target network devices specifically?

What I'm saying is that computation can be overlapped with data moves
sufficiently that having a separate instruction doesn't make it run any
faster. And (as below) that most of the problems are implementation
related, not due to lack of processor capability.

> >> >> 5)  Built-in "mbuf"s?  I.e. the processor understands a specific scheme
...
> It could make it easier to have code that says "give me bytes 20 and 21
> of this packet" without the code having to walk the mbufs itself.  It
> would make it easier to extract data from the packet arbitrarily.

If those bytes aren't simple offsets from a single pointer, then you
have other problems.

> You make it sound as if you don't think there are any possible enhancements
> that could be made to a processor to optimize TCP/IP stack implementation?

I think most of the issues of inefficiency are related to implementation
specifics, not the lack of necessary instructions. Just one opinion,
though.

Joe


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 20 20:55:16 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA07151
	for <tcpimpl-archive@odin.ietf.org>; Tue, 20 Jun 2000 20:55:11 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA04442
	for tcp-impl-outgoing; Tue, 20 Jun 2000 18:34:34 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id SAA04419
	for <tcp-impl@grc.nasa.gov>; Tue, 20 Jun 2000 18:34:32 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id SAA04175; Tue, 20 Jun 2000 18:34:31 -0400
Received: from aland.bbn.com(204.162.9.10) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma004150; Tue, 20 Jun 00 18:34:19 -0400
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.9.3/8.9.3) with ESMTP id SAA35074;
	Tue, 20 Jun 2000 18:34:15 -0400 (EDT)
	(envelope-from craig@aland.bbn.com)
Message-Id: <200006202234.SAA35074@aland.bbn.com>
To: Joe Touch <touch@ISI.EDU>
cc: tcp-impl@grc.nasa.gov
Subject: Re: ISA used in TCP/IP/UDP 
In-reply-to: Your message of "Tue, 20 Jun 2000 09:15:30 PDT."
             <394F98A2.105E9FF7@isi.edu> 
Date: Tue, 20 Jun 2000 18:34:15 -0400
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk


In message <394F98A2.105E9FF7@isi.edu>, Joe Touch writes:

>>         Copy and compute standard checksum (useful when collecting
>>         pieces of a packet before sending it).
>
>Better if done by a DMA.

Joe -- several months ago I would have said this too -- but a study
of TCP checksum errors I did with Jonathan Stone suggests that
DMA logic on some systems is trashing the data periodically -- and that
the TCP checksum is catching the errors -- so do the checksum before you
DMA!

Craig


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 27 17:23:04 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA25065
	for <tcpimpl-archive@odin.ietf.org>; Tue, 27 Jun 2000 17:23:04 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA28580
	for tcp-impl-outgoing; Tue, 27 Jun 2000 14:33:04 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id OAA28493
	for <tcp-impl@grc.nasa.gov>; Tue, 27 Jun 2000 14:32:59 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id OAA15860; Tue, 27 Jun 2000 14:32:57 -0400
Received: from prv-mail21.provo.novell.com(137.65.81.126) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma015802; Tue, 27 Jun 00 14:32:04 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail21.provo.novell.com; Tue, 27 Jun 2000 12:31:27 -0600
Message-ID: <3958F301.13815801@Novell.COM>
Date: Tue, 27 Jun 2000 12:31:29 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: TCP-IMPL <tcp-impl@grc.nasa.gov>
CC: Ramesh Shankar <RShankar@novell.com>
Subject: Send window update algorithm ...
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

[I apologise for sending this message again. But my previous attempt
resulted in an incomplete message being sent out by the screwed up
Netscape e-mail that I used].

--------------------
Here is what I understood about a correct implementation of a send
window update algorithm (This algorithm is responsible for deciding
whether or not the SEG.WND information in an incoming segment can be
used to update the SND.WND information):

1. Information from older segments shouldn't be used. (SEG.ACK < SND.UNA
or SEG.ACK > SND.MAX). It is assumed that the received segment has been
trimmed to fit in the receive window already.
2. Out of order segments shouldn't end up shrinking the window,
especially out of order plain ACKs (see RFC 793, page 43).

Here is one which I add to the above list:

3. With bidirectional data transfer, we could have updated the window
using an out of order segment. If the peer times out and retransmits (or
if we got a fast retransmit), we will start getting sequence #s which
are less than SND.WL1. As long as these segments ACK newer data or
advertise a bigger window, we should use SEG.WND to update SND.WND.

The standard BSD algorithm below, doesn't handle case 3 mentioned above:

Use SEG.WND to update SND.WND if:

- if (SEG.SEQ > SND.WL1)
- OR if ((SEG.SEQ == SND.WL1) && (SEG.ACK > SND.WL2))
- OR if ((SEG.ACK == SND.WL2)) && (SEG.WND > SND.WND))

In fact, during my testing, I ran exactly into case 3 mentioned above.
Instead of using the window information from a retransmission by the
peer which ACKed new data that we sent and which made the window 0, we
ignored the SEG.WND information (as SEG.SEQ < SND.WL1) and continued to
think that we had a non-zero window and tried to send another segment.
To the code this appeared as if the window had shrunk.

I am not sure why the following algorithm can't be used, which handles
all the cases mentioned above:

Use SEG.WND to update SND.WND if:

- if (SEG.SEQ > SND.WL1) [Set SND.WL1 = SEG.SEQ]
- OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.ACK]
- OR if ((SEG.SEQ == SND.WL1) && (SEG.ACK == SND.WL2) && 
	(SEG.WND > SND.WND)) [Don't update either SND.WL1 or SND.WL2].

Any insight would be greatly apprecited.

Thanks,

S.R.


From owner-tcp-impl@lerc.nasa.gov  Tue Jun 27 17:23:06 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA25076
	for <tcpimpl-archive@odin.ietf.org>; Tue, 27 Jun 2000 17:23:06 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA26501
	for tcp-impl-outgoing; Tue, 27 Jun 2000 14:20:56 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id OAA26425
	for <tcp-impl@grc.nasa.gov>; Tue, 27 Jun 2000 14:20:48 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id OAA14356; Tue, 27 Jun 2000 14:20:46 -0400
Received: from prv-mail21.provo.novell.com(137.65.81.126) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma014303; Tue, 27 Jun 00 14:20:18 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail21.provo.novell.com; Tue, 27 Jun 2000 12:19:56 -0600
Message-ID: <3958F04C.AB10CDD2@Novell.COM>
Date: Tue, 27 Jun 2000 12:19:56 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: TCP-IMPL <tcp-impl@grc.nasa.gov>
Subject: Send window update algorithm ...
Content-Type: multipart/mixed;
 boundary="------------2019ADE33AA3CC2B07AB39A2"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------2019ADE33AA3CC2B07AB39A2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Here is what I understood about a correct implementation of a send
window update algorithm (This algorithm is responsible for deciding
whether or not the SEG.WND information in an incoming segment can be
used to update the SND.WND information):

1. Information from older segments shouldn't be used. (SEG.ACK < SND.UNA
or SEG.ACK > SND.MAX). It is assume that the received segment has been
trimmed to fit in the receive window already.
2. Out of order segments shouldn't end up shrinking the window
(especially out of order plain ACKs: see RFC 793, page 43).

Here is one which I add to the above list:

3. With bidirectional data transfer, we could have updated the window
using an out of order segment. If the peer times out and retransmits (or
if we got a fast retransmit), we will start getting sequence #s which
are less than SND.WL1. As long as these segments ACK newer data or
advertise a bigger window, we should use SEG.WND to update SND.WND.

The standard BSD algorithm below, doesn't handle case 3 mentioned above:

Use SEG.WND to update SND.WND if:

- if (SEG.SEQ > SND.WL1)
- OR if ((SEG.SEQ == SND.WL1) && (SEG.ACK > SND.WL2))
- OR if ((SEG.ACK == SND.WL2)) && (SEG.WND > SND.WND))

In fact, during my testing, I ran exactly into case 3 mentioned above.
Instead of using the window information from a retransmission by the
peer which ACKed new data that we sent and which made the window 0, we
ignored the SEG.WND information (as SEG.SEQ < WL1) and continued to
think that we had a non-zero window and tried to send another segment.
To the code this appeared as if the window had shrunk.

I am not sure why the following algorithm can't be used, which handles
all the cases mentioned above:

Use SEG.WND to update SND.WND if:

- if (SEG.SEQ > SND.WL1) [Set SND.WL1 = SEG.SEQ]
- OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.ACK]
- OR if ((SEG.SEQ == SND.WL1) && (SEG.ACK == SND.WL2) && (SEG.WND >
SND.WND))
--------------2019ADE33AA3CC2B07AB39A2
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------2019ADE33AA3CC2B07AB39A2--



From owner-tcp-impl@lerc.nasa.gov  Wed Jun 28 15:36:05 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA02546
	for <tcpimpl-archive@odin.ietf.org>; Wed, 28 Jun 2000 15:36:04 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id MAA08108
	for tcp-impl-outgoing; Wed, 28 Jun 2000 12:57:22 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id MAA08042
	for <tcp-impl@grc.nasa.gov>; Wed, 28 Jun 2000 12:57:16 -0400 (EDT)
From: kuznet@ms2.inr.ac.ru
Received: by seraph3.lerc.nasa.gov; id MAA22645; Wed, 28 Jun 2000 12:57:14 -0400
Received: from minus.inr.ac.ru(193.233.7.97) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma022556; Wed, 28 Jun 00 12:56:16 -0400
Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA01363; Wed, 28 Jun 2000 20:56:04 +0400
Message-Id: <200006281656.UAA01363@ms2.inr.ac.ru>
Subject: Re: Send window update algorithm ...
To: RShankar@novell.com (Ramesh Shankar)
Date: Wed, 28 Jun 2000 20:56:03 +0400 (MSK DST)
Cc: tcp-impl@grc.nasa.gov, RShankar@novell.com
In-Reply-To: <3958F301.13815801@Novell.COM> from "Ramesh Shankar" at Jun 27, 0 12:31:29 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello!

> - OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.ACK]

The same problem is present in Linux tcp implementation.

This idea was the first, which came to my head.
But then I understood that it is highly dubious.
Look, with this change SND.UNA==SND.WL2 _identically_ and
can be simple omitted from TCB state. Right? 8)8)
So, the idea is suspiciously good. 8)

To all that I understood, RFC791 window update algorithm,
ignoring window updates with SEG.SEQ < SND.WL1, is designed
in assumption that receiver MAY _shrink_ window deliberately.
In this case check SEG.SEQ >= SND.WL1 prevents spurious window
reopening.

Essentially, TCP is designed to deliver reliably
not only data, but also window updates. Certainly,
it is useful only if receiver is allowed to shrink window,
so that its status is marginal. Maybe, RFC is even wrong.
I would prefer that it were wrong. 8)


So. The thing, which I proposed to make in this situation is:

1. To leave RFC algorithm intact, as checked and proven by time.

2. But in the case, when an ACK advances SND.UNA, but
   window update from it is ignored by RFC rules,
   hold right edge of window untouched.
   I.e. reduce SND.WND by SEG.ACK-oldSND.UNA.

This change is more conservative.

Note that it works exactly as with your change in the situation
of lockup, described by you. But it is more robust, because
it rejects spurious window inflation, when receiver really tries
to shrink window.

Alexey Kuznetsov


From owner-tcp-impl@lerc.nasa.gov  Wed Jun 28 17:00:40 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA04154
	for <tcpimpl-archive@odin.ietf.org>; Wed, 28 Jun 2000 17:00:39 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA24948
	for tcp-impl-outgoing; Wed, 28 Jun 2000 14:34:35 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id OAA24896
	for <tcp-impl@grc.nasa.gov>; Wed, 28 Jun 2000 14:34:33 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id OAA04539; Wed, 28 Jun 2000 14:34:28 -0400
Received: from prv-mail20.provo.novell.com(137.65.81.122) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma004494; Wed, 28 Jun 00 14:34:05 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail20.provo.novell.com; Wed, 28 Jun 2000 12:34:00 -0600
Message-ID: <395A4511.DA898B5B@Novell.COM>
Date: Wed, 28 Jun 2000 12:33:53 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: kuznet@ms2.inr.ac.ru
CC: TCP-IMPL <tcp-impl@grc.nasa.gov>, Ramesh Shankar <RShankar@novell.com>
Subject: Re: Send window update algorithm ...
References: <200006281656.UAA01363@ms2.inr.ac.ru>
Content-Type: multipart/mixed;
 boundary="------------D4389761F7E23D18983FA8A5"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------D4389761F7E23D18983FA8A5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

kuznet@ms2.inr.ac.ru wrote:
> 
> Hello!
> 
> > - OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.ACK]
> 
> The same problem is present in Linux tcp implementation.

The last time I looked at Linux, it seemed to have a totally different
set of problems and was different from the BSD implementation. I don't
recall anything further though.

> 
> This idea was the first, which came to my head.
> But then I understood that it is highly dubious.
> Look, with this change SND.UNA==SND.WL2 _identically_ and
> can be simple omitted from TCB state. Right? 8)8)
> So, the idea is suspiciously good. 8)

Strictly speaking, SND.WL2 looks redundant even in the existing
implementation. SEG.ACK has to be >= SND.UNA. Otherwise, this is an "old
ACK" and we ignore it as per RFC 793, page 72. Also SEG.ACK has to be <=
SND.NXT. If an ACK is valid AND ACKs new data, SND.UNA would anyway get
updated. And TCP ACKs are cumulative. That is, one could as well check
SEG.ACK against SND.UNA instead of SND.WL2!  I don't know how SND.WL2
can legally be > SND.UNA :-)). i.e. we used the window information from
the ACK, but didn't use the ACK itself. 

It appears that the SND.WL2 use is directly based on the comment on page
43 of RFC 793. Since TCP ACKs are cumulative, it seems pretty trivial to
detect out of order ACK segments - an out of order ACK won't pass the
SEG.ACK >= SND.UNA check on RFC 793, page 72.

Okay may be I am missing something seriously here :-)).

> 
> To all that I understood, RFC791 window update algorithm,
> ignoring window updates with SEG.SEQ < SND.WL1, is designed
> in assumption that receiver MAY _shrink_ window deliberately.
> In this case check SEG.SEQ >= SND.WL1 prevents spurious window
> reopening.
> 

Come to think of it, the intention of SEG.SEQ > SND.WL1 check is not
really that clear. A receiver can shrink the window using this loophole.
Perhaps it was intended that way: If new data is being sent or new data
is being ACKed (by the receiver), blindly use the window information
provided by the receiver.

Note, however, that the only illegal window shrinking case is that of 
pure ACKs containing window updates which shrink the window (i.e. they
didn't get out of order, they were actually sent that way). As the
current algorithms stand, one can shrink the window in a segment which
has SEG.SEQ > SND.WL1 or which has SEG.ACK > SND.WL2 (eventhough this is
strongly discouraged by RFC 1122).

The original RFC 793 algorithm didn't handle the case of pure window
updates coming out of order. (i.e. SEG.SEQ == SND.WL1, SEG.ACK ==
SND.WL2, SEG.WND specifies the updated window). This has been fixed in
the BSD algorithm. Hence the check:

if ((SEG.ACK == SND.WL2)) && (SEG.WND > SND.WND))

Perhaps any of the original TCP implementors/RFC 793 writers can shed
some light on this.

Thanks,

S.R.
--------------D4389761F7E23D18983FA8A5
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------D4389761F7E23D18983FA8A5--



From owner-tcp-impl@lerc.nasa.gov  Wed Jun 28 17:43:41 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA04979
	for <tcpimpl-archive@odin.ietf.org>; Wed, 28 Jun 2000 17:43:40 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA02100
	for tcp-impl-outgoing; Wed, 28 Jun 2000 15:13:22 -0400 (EDT)
Received: from seraph2.lerc.nasa.gov (firewall-user@guardian02.lerc.nasa.gov [139.88.146.11])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id PAA02090
	for <tcp-impl@grc.nasa.gov>; Wed, 28 Jun 2000 15:13:21 -0400 (EDT)
From: kuznet@ms2.inr.ac.ru
Received: by seraph2.lerc.nasa.gov; id PAA12568; Wed, 28 Jun 2000 15:13:20 -0400
Received: from minus.inr.ac.ru(193.233.7.97) by seraph2.lerc.nasa.gov via smap (V5.5)
	id xma011534; Wed, 28 Jun 00 15:12:48 -0400
Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA04185; Wed, 28 Jun 2000 23:12:32 +0400
Message-Id: <200006281912.XAA04185@ms2.inr.ac.ru>
Subject: Re: Send window update algorithm ...
To: RShankar@novell.com (Ramesh Shankar)
Date: Wed, 28 Jun 2000 23:12:32 +0400 (MSK DST)
Cc: tcp-impl@grc.nasa.gov, RShankar@novell.com
In-Reply-To: <395A4511.DA898B5B@Novell.COM> from "Ramesh Shankar" at Jun 28, 0 12:33:53 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello!

> The last time I looked at Linux, it seemed to have a totally different
> set of problems and was different from the BSD implementation.

They coincide in this place literally.



> Strictly speaking, SND.WL2 looks redundant even in the existing
> implementation. SEG.ACK has to be >= SND.UNA. Otherwise, this is an "old
> ACK" and we ignore it as per RFC 793, page 72. Also SEG.ACK has to be <=
> SND.NXT. If an ACK is valid AND ACKs new data, SND.UNA would anyway get
> updated. And TCP ACKs are cumulative. That is, one could as well check
> SEG.ACK against SND.UNA instead of SND.WL2!  I don't know how SND.WL2
> can legally be > SND.UNA :-)). i.e. we used the window information from
> the ACK, but didn't use the ACK itself. 

SND.UNA cannot be less than SND.WL2, certainly.

But it can be greater, when window update is not accepted due
to the first rule.

In this case SND.WL2 is frozen before SND.UNA and the first duplicate
ACK with SEG.ACK==SND.UNA will change window. See?
Normally, dupacks only increase window. But dupack with SEG.SEQ > SND.WL1
is allowed to shrink it.


> Note, however, that the only illegal window shrinking case is that of 
> pure ACKs containing window updates which shrink the window (i.e. they
> didn't get out of order, they were actually sent that way). As the
> current algorithms stand, one can shrink the window in a segment which
> has SEG.SEQ > SND.WL1 or which has SEG.ACK > SND.WL2 (eventhough this is
> strongly discouraged by RFC 1122).

Unfortunately, it was not enough strongly discouraged to force people
working on wireless devices to forget about idea to use this hole. 8)8)
Look at rfc2757.



> The original RFC 793 algorithm didn't handle the case of pure window
> updates coming out of order. (i.e. SEG.SEQ == SND.WL1, SEG.ACK ==
> SND.WL2, SEG.WND specifies the updated window). This has been fixed in
> the BSD algorithm.

RFC793 forgot not only this. The case of SEG.ACK==SND.UNA
is completely forgotten on that page. 8)

Yes, I think all this chapter is one big bug yet.


Probably, correct approach prioritizes ACK and SEQ updates
in backward order, so that check should look like:

if (SEG.ACK > SND.UNA ||
    (SEG.ACK == SND.UNA &&
     (SEG.SEQ > SND.WL || (SEG.SEQ == SND.WL && SEG.WIN > SND.WND)))) {
	SND.WND = SEG.WND;
	SND.WL = SEG.WL;
}


Alexey Kuznetsov


From owner-tcp-impl@lerc.nasa.gov  Wed Jun 28 17:53:36 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA05072
	for <tcpimpl-archive@odin.ietf.org>; Wed, 28 Jun 2000 17:53:35 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA06130
	for tcp-impl-outgoing; Wed, 28 Jun 2000 15:35:39 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id PAA06087
	for <tcp-impl@grc.nasa.gov>; Wed, 28 Jun 2000 15:35:35 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA12701; Wed, 28 Jun 2000 15:35:34 -0400
Received: from prv-mail21.provo.novell.com(137.65.81.126) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma012655; Wed, 28 Jun 00 15:35:27 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail21.provo.novell.com; Wed, 28 Jun 2000 13:35:18 -0600
Message-ID: <395A536F.7CA35A6E@Novell.COM>
Date: Wed, 28 Jun 2000 13:35:11 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: kuznet@ms2.inr.ac.ru, TCP-IMPL <tcp-impl@grc.nasa.gov>
CC: Ramesh Shankar <RShankar@novell.com>
Subject: Re: Send window update algorithm ...
References: <200006281912.XAA04185@ms2.inr.ac.ru>
Content-Type: multipart/mixed;
 boundary="------------D139B2C1C85E991CE13ACEE7"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------D139B2C1C85E991CE13ACEE7
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

> 
> SND.UNA cannot be less than SND.WL2, certainly.
> 
> But it can be greater, when window update is not accepted due
> to the first rule.
> 
> In this case SND.WL2 is frozen before SND.UNA and the first duplicate
> ACK with SEG.ACK==SND.UNA will change window. See?
> Normally, dupacks only increase window. But dupack with SEG.SEQ > SND.WL1
> is allowed to shrink it.
> 

I am not sure what you mean by "not accepted due to the first rule". In
the standard BSD scheme, the only case where new data is ACKed AND the
window information is NOT used is when SEG.SEQ <  SND.WL1: As I pointed,
out this causes grief during retransmission involving bidirectional data
transfer. In fact, it would appear to me that SND.WL2 was probably
originally intended to freeze the window updates during retransmits (by
the receiver) when doing bidirectional data transfer. i.e. it is more of
a "feature" than a bug. If it really was the case, I am not sure what
the motivation behind this was.

I am not able to see any other case where SND.WL2 is not moved along
with SND.UNA. Thus, if it is okay to update the send window (I don't see
any reason why we shouldn't be) during retransmits by the receiver,
SND.WL2 naturally becomes redundant. 

From a coding perspective, I can understand why SND.WL2 may have been
there: by the time you get to the send window update part, your SND.UNA
has already been set to SEG.ACK. Hence, you can't compare SND.UNA with
SEG.ACK. SND.WL2 in this case really serves as value of SND.UNA *before*
it was set to SEG.ACK.

Thanks,

S.R.
--------------D139B2C1C85E991CE13ACEE7
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------D139B2C1C85E991CE13ACEE7--



From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 06:52:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id GAA25500
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 06:52:28 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id EAA02164
	for tcp-impl-outgoing; Thu, 29 Jun 2000 04:27:51 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id EAA02146
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 04:27:48 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id EAA21374; Thu, 29 Jun 2000 04:27:47 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma021326; Thu, 29 Jun 00 04:27:00 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id BAA11289
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 01:26:58 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.88.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id BAA29036
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 01:26:58 -0700 (PDT)
Received: from dors (awe185-77.AWE.Sun.COM [192.29.185.77])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5T8Qvj407711
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 01:26:57 -0700 (PDT)
Date: Thu, 29 Jun 2000 01:28:22 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <Roam.SIMC.2.0.6.962249981.3995.kcpoon@jurassic>
Message-ID: <Roam.SIMC.2.0.6.962267302.6501.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> In fact, during my testing, I ran exactly into case 3 mentioned above.
> Instead of using the window information from a retransmission by the
> peer which ACKed new data that we sent and which made the window 0, we
> ignored the SEG.WND information (as SEG.SEQ < SND.WL1) and continued to
> think that we had a non-zero window and tried to send another segment.

In the case described by Ramesh, if he follows 4.2.2.16 of RFC 1122, then
the worst case is that TCP sends some segments which are not accepted by
the other end, because the window update in the retransmitted segment is
not accepted.  Can it be treated as a "shrinking" window case?  TCP can
handle shrinked window, right?

It seems to me that there can be quite a few corner cases when the current
check can fail.  The problem is that window updates do not have sequence
number.  I think this means that whatever we check, there can be cases when
the check fails.  The question is if we can catch the majority of cases so
that TCP works well.  And for those cases when the check fails, TCP should
still work, as in the case Ramesh described.

> Probably, correct approach prioritizes ACK and SEQ updates
> in backward order, so that check should look like:
> 
> if (SEG.ACK > SND.UNA ||
>     (SEG.ACK == SND.UNA &&
>      (SEG.SEQ > SND.WL || (SEG.SEQ == SND.WL && SEG.WIN > SND.WND)))) {
> 	SND.WND = SEG.WND;
> 	SND.WL = SEG.WL;
> }

Isn't the first check above equal to the sentence on page 72 of RFC 793 below?

	If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
	updated.  

It is interesting that no implementation I know of has this simple check?  
There must be a reason...  IMHO, the above check seems to work.

							K. Poon.
							kcpoon@eng.sun.com






From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 15:34:59 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA18539
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 15:34:59 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id NAA11149
	for tcp-impl-outgoing; Thu, 29 Jun 2000 13:02:50 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id NAA11097
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 13:02:46 -0400 (EDT)
From: kuznet@ms2.inr.ac.ru
Received: by seraph3.lerc.nasa.gov; id NAA11599; Thu, 29 Jun 2000 13:02:45 -0400
Received: from minus.inr.ac.ru(193.233.7.97) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma011546; Thu, 29 Jun 00 13:01:52 -0400
Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA15375; Thu, 29 Jun 2000 21:01:47 +0400
Message-Id: <200006291701.VAA15375@ms2.inr.ac.ru>
Subject: Re: Send window update algorithm ...
To: RShankar@novell.com (Ramesh Shankar)
Date: Thu, 29 Jun 2000 21:01:47 +0400 (MSK DST)
Cc: tcp-impl@grc.nasa.gov, RShankar@novell.com
In-Reply-To: <395A536F.7CA35A6E@Novell.COM> from "Ramesh Shankar" at Jun 28, 0 01:35:11 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello!

> I am not sure what you mean by "not accepted due to the first rule".

Seems, I understand source of misunderstanding. 8)

That guy, who fixed RFC algorithm old days, forgot to add one level
of parenthesis in hurry. 8)


	if (((tiflags & TH_ACK) && SEQ_LT(tp->snd_wl1, ti->ti_seq)) ||
	    (tp->snd_wl1 == ti->ti_seq && (SEQ_LT(tp->snd_wl2, ti->ti_ack)) ||
					  ^
	    (tp->snd_wl2 == ti->ti_ack && tiwin > tp->snd_wnd))) {
							       ^

Well, accepting window update with arbitrary seq _only_ from dupacks,
but ignoring it for real good acks is evident misprint. 8)

It does not change anything but adding more mess.


So, do you see something wrong in:

	if ((tiflags & TH_ACK) &&
	     (SEQ_LT(tp->snd_una, ti->ti_ack) ||
	      SEQ_LT(tp->snd_wl1, ti->ti_seq) ||
	      (tp->snd_wl1==ti->ti_seq && tiwin > tp->snd_wnd)))

I.e. 

1. new ACK always updates window.
2. dupack updates window, if it advances SND.WL1,
   or its SEG.SEQ==SND.WL1, but advertised window increases.

SND.WL2 is not used, it is ==SND.UNA.

Alexey


From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 17:30:34 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA20433
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 17:30:34 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA11738
	for tcp-impl-outgoing; Thu, 29 Jun 2000 15:04:14 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id PAA11695
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 15:04:10 -0400 (EDT)
From: kuznet@ms2.inr.ac.ru
Received: by seraph3.lerc.nasa.gov; id PAA25086; Thu, 29 Jun 2000 15:04:08 -0400
Received: from minus.inr.ac.ru(193.233.7.97) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma025067; Thu, 29 Jun 00 15:04:05 -0400
Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA16182; Thu, 29 Jun 2000 23:03:53 +0400
Message-Id: <200006291903.XAA16182@ms2.inr.ac.ru>
Subject: Re: Send window update algorithm ...
To: Kacheong.Poon@Eng.Sun.COM
Date: Thu, 29 Jun 2000 23:03:53 +0400 (MSK DST)
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: <Roam.SIMC.2.0.6.962267302.6501.kcpoon@jurassic> from "Kacheong Poon" at Jun 29, 0 01:28:22 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello!

> > if (SEG.ACK > SND.UNA ||
> >     (SEG.ACK == SND.UNA &&
> >      (SEG.SEQ > SND.WL || (SEG.SEQ == SND.WL && SEG.WIN > SND.WND)))) {
> > 	SND.WND = SEG.WND;
> > 	SND.WL = SEG.WL;
> > }
> 
> Isn't the first check above equal to the sentence on page 72 of RFC 793 below?
> 
> 	If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
> 	updated.  

This is mipsrint in RFC, fixed in RFC1122. 4.2.2.10g. The first < reads as <=.

In the pseudo-code above, the first line is required to eliminate
case of SEG.ACK==SND.UNA. The second line is redundant, indeed.

Alexey


From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 18:00:37 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA21049
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 18:00:37 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA17275
	for tcp-impl-outgoing; Thu, 29 Jun 2000 15:31:33 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id PAA17241
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 15:31:30 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA28320; Thu, 29 Jun 2000 15:31:29 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma028250; Thu, 29 Jun 00 15:30:36 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id MAA14007
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 12:30:34 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.86.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id MAA26025
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 12:30:34 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5TJUVj475586
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 12:30:31 -0700 (PDT)
Date: Thu, 29 Jun 2000 12:30:22 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <200006291903.XAA16182@ms2.inr.ac.ru>
Message-ID: <Roam.SIMC.2.0.6.962307022.26247.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> > 	If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
> > 	updated.  
> 
> This is mipsrint in RFC, fixed in RFC1122. 4.2.2.10g. The first < reads as
> <=.

It seems to me that RFC 1122's correction (4.2.2.20g) is not correct in the
window update part.  It is correct in the part dealing with duplicate ACK. 
The send window should not be updated unconditionally when SND.UNA == SEG.ACK.
No implementation does that anyway.  I think the original check in RFC 793 
above is correct.  So no need to be corrected C:

Alexey, can you send your suggested check to end2end list to see if people
see any problem in that?  Thanks.

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 19:16:02 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA22782
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 19:16:01 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id RAA02320
	for tcp-impl-outgoing; Thu, 29 Jun 2000 17:08:56 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id RAA02259
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 17:08:51 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id RAA10112; Thu, 29 Jun 2000 17:08:50 -0400
Received: from web2902.mail.yahoo.com(128.11.68.45) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma010064; Thu, 29 Jun 00 17:08:21 -0400
Received: (qmail 12296 invoked by uid 60001); 29 Jun 2000 21:08:16 -0000
Message-ID: <20000629210816.12295.qmail@web2902.mail.yahoo.com>
Received: from [32.97.88.100] by web2902.mail.yahoo.com; Thu, 29 Jun 2000 14:08:16 PDT
Date: Thu, 29 Jun 2000 14:08:16 -0700 (PDT)
From: vijay singh <vijjus@rocketmail.com>
Subject: HP-UX -11.0  (FIN_WAIT_2)
To: tcp-impl@grc.nasa.gov
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello,

is anyone on the list aware of problems on the HP UX
TCP stack implementation, wherein a connection in the
FIN_WAIT_2 state remains so indefinitely. This may
have happened as the application (client) exited
without closing the socket. Specifically I have the
following questions:

1. Is there a known problem like this?
2. Is there a way to drop a connection (with the tuple
obtained from netstat) without recycling the kernel?
3. What are the consequences of issuing a *shutdown*
and immediately following it with a *close*?

Any help is appreciated.
Vijay

=====
VIJAY SINGH
41 W Brown Street,
Apt# 7,
Somerville - NJ 08876.
Ph (H) - (908)575-3255
   (O) - (973)360-3148.
E-MAIL: vsingh@pershing.com

__________________________________________________
Do You Yahoo!?
Get Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/


From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 21:03:19 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA23930
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 21:03:13 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA10421
	for tcp-impl-outgoing; Thu, 29 Jun 2000 18:37:55 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id SAA10398
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 18:37:53 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id SAA19829; Thu, 29 Jun 2000 18:37:51 -0400
Received: from palrel1.hp.com(156.153.255.242) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma019805; Thu, 29 Jun 00 18:37:48 -0400
Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.8.80.176])
	by palrel1.hp.com (Postfix) with ESMTP id E6B53213
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 15:37:47 -0700 (PDT)
Received: from cup.hp.com (raj@localhost [127.0.0.1])
	by tardy.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id PAA21422
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 15:37:47 -0700 (PDT)
Message-ID: <395BCFBB.45EA9A04@cup.hp.com>
Date: Thu, 29 Jun 2000 15:37:47 -0700
From: Rick Jones <raj@cup.hp.com>
Organization: the Unofficial HP
X-Mailer: Mozilla 4.7 [en] (X11; U; HP-UX B.11.00 9000/785)
X-Accept-Language: en
MIME-Version: 1.0
To: tcp-impl@grc.nasa.gov
Subject: Re: HP-UX -11.0  (FIN_WAIT_2)
References: <20000629210816.12295.qmail@web2902.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk
Content-Transfer-Encoding: 7bit

vijay singh wrote:
> 
> Hello,
> 
> is anyone on the list aware of problems on the HP UX
> TCP stack implementation, wherein a connection in the
> FIN_WAIT_2 state remains so indefinitely. This may
> have happened as the application (client) exited
> without closing the socket. Specifically I have the
> following questions:
> 
> 1. Is there a known problem like this?

Not that I happen to be aware of, but you can search the databases in
the HP "IT RC" (support web site) which you can find via links from
http://www.hp.com/ . Just how "indefinite" has your indefinite been
shown to be at this point?

> 2. Is there a way to drop a connection (with the tuple
> obtained from netstat) without recycling the kernel?

It shouldn't be necessary to do either - upon entering a "detached"
state the stack will automagically start a timer based on the setting of
tcp_keepalive_detached_interval (see ndd -h
tcp_keepalive_detached_interval for more info). If the remote is not
responding to the keepalive probes, the endpoint should go away within
tcp_ip_abort_interval milliseconds. 

If the remote is still responding to the probes, the endpoint will
remain.

How many of these endpoints are there on your system, and what do you
want to accomplish that their presence is precluding?

> 3. What are the consequences of issuing a *shutdown*
> and immediately following it with a *close*?

Nothing untoward - the netperf benchmark does something akin to that all
the time.

rick jones
-- 
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...


From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 21:47:47 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA25229
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 21:47:46 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA14363
	for tcp-impl-outgoing; Thu, 29 Jun 2000 19:37:19 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id TAA14341
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 19:37:17 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id TAA25507; Thu, 29 Jun 2000 19:37:16 -0400
Received: from palrel1.hp.com(156.153.255.242) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma025483; Thu, 29 Jun 00 19:37:11 -0400
Received: from hpcuhe.cup.hp.com (hpcuhe.cup.hp.com [15.0.80.203])
	by palrel1.hp.com (Postfix) with ESMTP
	id 5F072BE6; Thu, 29 Jun 2000 16:37:10 -0700 (PDT)
Received: from cup.hp.com (scotty@hpindsdl.cup.hp.com [15.13.131.111])
	by hpcuhe.cup.hp.com (8.9.3 (PHNE_18979)/8.9.3 SMKit7.02) with ESMTP id QAA18507;
	Thu, 29 Jun 2000 16:37:06 -0700 (PDT)
Message-ID: <395BDD38.BA962F8F@cup.hp.com>
Date: Thu, 29 Jun 2000 16:35:20 -0700
From: Scott Millward <scotty@cup.hp.com>
Organization: Hewlett Packard - SISL Lab
X-Mailer: Mozilla 4.72 [en] (X11; U; HP-UX B.10.20 9000/785)
X-Accept-Language: en
MIME-Version: 1.0
To: vijay singh <vijjus@rocketmail.com>
Cc: tcp-impl@grc.nasa.gov
Subject: Re: HP-UX -11.0  (FIN_WAIT_2)
References: <20000629210816.12295.qmail@web2902.mail.yahoo.com>
Content-Type: multipart/mixed;
 boundary="------------FCA1553A586693ECAA393733"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------FCA1553A586693ECAA393733
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hello,

On 11.0 there is an algorithm that extends the keepalive probes during
the FIN_WAIT_2 state.  This was to prevent an time-based drop of the
connection half without really knowing if the other half was really
gone, or just delaying their close.  The BSD Stack has a timer that
silently drops the connections in FIN_WAIT_2 after a specified interval.

Due to popular demand ;-) .. The old BSD style FIN_WAIT_2 timer can be
enabled in the HPUX 11.0 stack if you have a recent Transport Patch
level. (The latest patch ID for Transport as of today is PHNE_21767)

Following is a write up on how you would enable this:

--------------------------------------------------------------------

There is an ndd parameter, post-patch PHNE_19375/11.0, that is called

  tcp_fin_wait_2_timeout

This parameter sets the fin_wait_2 timer on 11.X to stop idle fin_wait_2
connections. It will not survive a reboot, so modification of the
/etc/rc.config.d/nddconf is a necessary.


It specifies an interval, in milliseconds, after which the TCP will be
unconditionally killed.  An appropriate reset segment will be sent when
the connection is killed.

The default for tcp_fin_wait_2_timeout is 0, which allows the connection
to live forever, as long as the far side continues to answer keepalives.

To enable the tcp_fin_wait_2 timer to timeout do the following:

1. To get the current value (0 is turned off):
   # ndd -get /dev/tcp tcp_fin_wait_2_timeout 0

2. To set the value to 20 min's:
   # ndd -set /dev/tcp tcp_fin_wait_2_timeout 1200000

3. Check the setting:
   # ndd -get /dev/tcp tcp_fin_wait_2_timeout 1200000


Note: (1000 ms in 1 second) * (60 seconds) * (20 minutes)= 1200000 ms.
20 minutes is just an example but probably a good selection.


This will not survive a reboot, so you need to update

  /etc/rc.config.d/nddconf

with the parameter so that it will be set at boot time.

  TRANSPORT_NAME[0]=tcp
  NDD_NAME[0]=tcp_fin_wait_2_timeout
  NDD_VALUE[0]=1200000



NOTE: patches may be superceded, verify your patch levels via the patch
database on the IT resource Center.

For more information regarding Transmission Control Protocols See RFC
793






vijay singh wrote:
> 
> Hello,
> 
> is anyone on the list aware of problems on the HP UX
> TCP stack implementation, wherein a connection in the
> FIN_WAIT_2 state remains so indefinitely. This may
> have happened as the application (client) exited
> without closing the socket. Specifically I have the
> following questions:
> 
> 1. Is there a known problem like this?
> 2. Is there a way to drop a connection (with the tuple
> obtained from netstat) without recycling the kernel?
> 3. What are the consequences of issuing a *shutdown*
> and immediately following it with a *close*?
> 
> Any help is appreciated.
> Vijay
> 
> =====
> VIJAY SINGH
> 41 W Brown Street,
> Apt# 7,
> Somerville - NJ 08876.
> Ph (H) - (908)575-3255
>    (O) - (973)360-3148.
> E-MAIL: vsingh@pershing.com
> 
> __________________________________________________
> Do You Yahoo!?
> Get Yahoo! Mail - Free email you can access from anywhere!
> http://mail.yahoo.com/
--------------FCA1553A586693ECAA393733
Content-Type: text/x-vcard; charset=us-ascii;
 name="scotty.vcf"
Content-Description: Card for Scott Millward
Content-Disposition: attachment;
 filename="scotty.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Millward;Scott
tel;fax:916-748-1415
tel;work:408-447-1861
x-mozilla-html:FALSE
url:http://hpindsdl.cup.hp.com/~scotty
org:Hewlett Packard - SISL Lab;Lan Section
adr:;;;Cupertino ;California;;
version:2.1
email;internet:scotty@cup.hp.com
title:Scott Millward <scotty@cup.hp.com>
end:vcard

--------------FCA1553A586693ECAA393733--



From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 21:49:25 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA25245
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 21:49:24 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA14306
	for tcp-impl-outgoing; Thu, 29 Jun 2000 19:36:20 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id TAA14272
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 19:36:18 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id TAA25410; Thu, 29 Jun 2000 19:36:16 -0400
Received: from prv-mail21.provo.novell.com(137.65.81.126) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma025248; Thu, 29 Jun 00 19:35:13 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail21.provo.novell.com; Thu, 29 Jun 2000 17:34:44 -0600
Message-ID: <395BDD13.1E54F741@Novell.COM>
Date: Thu, 29 Jun 2000 17:34:43 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: kuznet@ms2.inr.ac.ru, TCP-IMPL <tcp-impl@grc.nasa.gov>
CC: Ramesh Shankar <RShankar@novell.com>
Subject: Re: Send window update algorithm ...
References: <200006291701.VAA15375@ms2.inr.ac.ru>
Content-Type: multipart/mixed;
 boundary="------------955C09B3CA445EB478861A2D"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------955C09B3CA445EB478861A2D
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

The case of pure ACK segments (aka window updates) coming out of order
is not handled by the last check.

Here is the one that I quoted:

Set SND.WND = SEG.WND: [Assuming Window scaling is correctly handled].
if (SEG.SEQ > SND.WL1) [Set SND.WL1 = SEG.SEQ]
OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.WL2]
OR if ((SND.WL2 == SEG.ACK) && (SEG.WND > SND.WND))

It is assumed that the segment has been already trimmed to be inside the
window and the ACK satisfies the check:

SEG.ACK <= SND.UNA <= SND.MAX.

I still retain SND.WL2 as a "local variable" which contains the previous
value of SND.UNA. [When these checks are done, SND.UNA would have been
already set to SEG.ACK. You could save it in a local variable before the
update and use it here. Doesn't matter].

The above correctly handles the following cases:

- Pure ACKs (aka window updates) coming out of order. These will have
same SEG.SEQ, SEG.ACK and (legally) increasing window sizes.
- ACK information sent in retransmitted segments. These will have
SEG.SEQ < SND.WL1, SEG.ACK >= SND.WL2.
- Window can be shrunk (even if strongly discouraged by RFC 1122) with
SEG.SEQ > SND.WL1, SEG.ACK > SND.WL2, but NOT otherwise.
- The case of segments coming out of order is anyway automatically
handled as such a segment will have SEG.ACK <= SND.WL2.

Thanks,

S.R.

kuznet@ms2.inr.ac.ru wrote:
> 
> Hello!
> 
> > I am not sure what you mean by "not accepted due to the first rule".
> 
> Seems, I understand source of misunderstanding. 8)
> 
> That guy, who fixed RFC algorithm old days, forgot to add one level
> of parenthesis in hurry. 8)
> 
>         if (((tiflags & TH_ACK) && SEQ_LT(tp->snd_wl1, ti->ti_seq)) ||
>             (tp->snd_wl1 == ti->ti_seq && (SEQ_LT(tp->snd_wl2, ti->ti_ack)) ||
>                                           ^
>             (tp->snd_wl2 == ti->ti_ack && tiwin > tp->snd_wnd))) {
>                                                                ^
> 
> Well, accepting window update with arbitrary seq _only_ from dupacks,
> but ignoring it for real good acks is evident misprint. 8)
> 
> It does not change anything but adding more mess.
> 
> So, do you see something wrong in:
> 
>         if ((tiflags & TH_ACK) &&
>              (SEQ_LT(tp->snd_una, ti->ti_ack) ||
>               SEQ_LT(tp->snd_wl1, ti->ti_seq) ||
>               (tp->snd_wl1==ti->ti_seq && tiwin > tp->snd_wnd)))
> 
> I.e.
> 
> 1. new ACK always updates window.
> 2. dupack updates window, if it advances SND.WL1,
>    or its SEG.SEQ==SND.WL1, but advertised window increases.
> 
> SND.WL2 is not used, it is ==SND.UNA.
> 
> Alexey
--------------955C09B3CA445EB478861A2D
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------955C09B3CA445EB478861A2D--



From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 21:51:59 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA25265
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 21:51:59 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA14851
	for tcp-impl-outgoing; Thu, 29 Jun 2000 19:44:22 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id TAA14831
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 19:44:19 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id TAA26180; Thu, 29 Jun 2000 19:44:18 -0400
Received: from prv-mail21.provo.novell.com(137.65.81.126) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma026125; Thu, 29 Jun 00 19:43:31 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail21.provo.novell.com; Thu, 29 Jun 2000 17:43:05 -0600
Message-ID: <395BDF06.61D523C4@Novell.COM>
Date: Thu, 29 Jun 2000 17:43:02 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: vijay singh <vijjus@rocketmail.com>
CC: tcp-impl@grc.nasa.gov
Subject: Re: HP-UX -11.0  (FIN_WAIT_2)
References: <20000629210816.12295.qmail@web2902.mail.yahoo.com>
Content-Type: multipart/mixed;
 boundary="------------29B1FB67C2490D04DC9860C1"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------29B1FB67C2490D04DC9860C1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Did you check up RFC 2525, known TCP implementation problems? I thought
this one was mentioned in there.

Thanks,

S.R.

vijay singh wrote:
> 
> Hello,
> 
> is anyone on the list aware of problems on the HP UX
> TCP stack implementation, wherein a connection in the
> FIN_WAIT_2 state remains so indefinitely. This may
> have happened as the application (client) exited
> without closing the socket. Specifically I have the
> following questions:
> 
> 1. Is there a known problem like this?
> 2. Is there a way to drop a connection (with the tuple
> obtained from netstat) without recycling the kernel?
> 3. What are the consequences of issuing a *shutdown*
> and immediately following it with a *close*?
> 
> Any help is appreciated.
> Vijay
> 
> =====
> VIJAY SINGH
> 41 W Brown Street,
> Apt# 7,
> Somerville - NJ 08876.
> Ph (H) - (908)575-3255
>    (O) - (973)360-3148.
> E-MAIL: vsingh@pershing.com
> 
> __________________________________________________
> Do You Yahoo!?
> Get Yahoo! Mail - Free email you can access from anywhere!
> http://mail.yahoo.com/
--------------29B1FB67C2490D04DC9860C1
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------29B1FB67C2490D04DC9860C1--



From owner-tcp-impl@lerc.nasa.gov  Thu Jun 29 23:05:17 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id XAA26868
	for <tcpimpl-archive@odin.ietf.org>; Thu, 29 Jun 2000 23:05:16 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id UAA17919
	for tcp-impl-outgoing; Thu, 29 Jun 2000 20:37:30 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id UAA17896
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 20:37:28 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id UAA00701; Thu, 29 Jun 2000 20:37:27 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma000685; Thu, 29 Jun 00 20:37:25 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id RAA07133
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 17:37:24 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.83.130])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id RAA06365
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 17:37:24 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5U0bKj526278
	for <tcp-impl@grc.nasa.gov>; Thu, 29 Jun 2000 17:37:21 -0700 (PDT)
Date: Thu, 29 Jun 2000 17:37:12 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <395BDD13.1E54F741@Novell.COM>
Message-ID: <Roam.SIMC.2.0.6.962325432.19099.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> The case of pure ACK segments (aka window updates) coming out of order
> is not handled by the last check.

By pure window updates, do you mean that SEG.SEQ and SEG.ACK stay the same?
I think the check handles that.  If SEG.ACK < SND.UNA, the flow will not 
go to the update path.  So the last check is equivalent to

	SEG.ACK == SND.WL2 && SND.WL1 == SEG.SEQ && SEG.WND > SND.WND

Isn's this pure window update handling since SND.WL2 is just equal to
SEG.ACK for pure window updates?  Or do you mean another thing?  Actually,
I think your revised check is equivalent to Alexey's check, except the order
of checking is different.

> It is assumed that the segment has been already trimmed to be inside the
> window and the ACK satisfies the check:
> 
> SEG.ACK <= SND.UNA <= SND.MAX.

Typo?  I guess you mean SND.UNA <= SEG.ACK <= SND.MAX.

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 11:36:04 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA19767
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 11:36:03 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id JAA03561
	for tcp-impl-outgoing; Fri, 30 Jun 2000 09:18:38 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id JAA03519
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 09:18:32 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id JAA27592; Fri, 30 Jun 2000 09:18:30 -0400
Received: from prv-mail25.provo.novell.com(137.65.81.121) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma027541; Fri, 30 Jun 00 09:17:57 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by prv-mail25.provo.novell.com; Fri, 30 Jun 2000 07:17:23 -0600
Message-ID: <395C9DE5.DB28BE7E@Novell.COM>
Date: Fri, 30 Jun 2000 07:17:25 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: kuznet@ms2.inr.ac.ru, TCP-IMPL <tcp-impl@grc.nasa.gov>
Subject: Re: Send window update algorithm ...
References: <200006291701.VAA15375@ms2.inr.ac.ru> <395BDD13.1E54F741@Novell.COM>
Content-Type: multipart/mixed;
 boundary="------------9A22F7CC7A3BD6B0B85E785F"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------9A22F7CC7A3BD6B0B85E785F
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Ramesh Shankar wrote:
> 
> The case of pure ACK segments (aka window updates) coming out of order
> is not handled by the last check.
> 
> Here is the one that I quoted:
> 
> Set SND.WND = SEG.WND: [Assuming Window scaling is correctly handled].
> if (SEG.SEQ > SND.WL1) [Set SND.WL1 = SEG.SEQ]
> OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.WL2]
> OR if ((SND.WL2 == SEG.ACK) && (SEG.WND > SND.WND))
> 

What I meant to say was that when receiver is retransmitting, SEG.SEQ
will be < SND.WL1. Hence, the third check must use SND.WL2 to handle
pure ACKs having window updates. This was one of the issues that I
started off the whole discussion with :-)).

Thanks,

S.R.
--------------9A22F7CC7A3BD6B0B85E785F
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------9A22F7CC7A3BD6B0B85E785F--



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 16:59:39 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA26326
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 16:59:39 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA21153
	for tcp-impl-outgoing; Fri, 30 Jun 2000 14:45:45 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id OAA21119
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 14:45:40 -0400 (EDT)
From: kuznet@ms2.inr.ac.ru
Received: by seraph3.lerc.nasa.gov; id OAA05459; Fri, 30 Jun 2000 14:45:39 -0400
Received: from minus.inr.ac.ru(193.233.7.97) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma005408; Fri, 30 Jun 00 14:45:21 -0400
Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA25907; Fri, 30 Jun 2000 22:45:16 +0400
Message-Id: <200006301845.WAA25907@ms2.inr.ac.ru>
Subject: Re: Send window update algorithm ...
To: RShankar@novell.com (Ramesh Shankar)
Date: Fri, 30 Jun 2000 22:45:15 +0400 (MSK DST)
Cc: tcp-impl@grc.nasa.gov, RShankar@novell.com
In-Reply-To: <395BDD13.1E54F741@Novell.COM> from "Ramesh Shankar" at Jun 29, 0 05:34:43 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello!

> Set SND.WND = SEG.WND: [Assuming Window scaling is correctly handled].
> if (SEG.SEQ > SND.WL1) [Set SND.WL1 = SEG.SEQ]
> OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.WL2]
> OR if ((SND.WL2 == SEG.ACK) && (SEG.WND > SND.WND))

Hey, stop! I tried to code this and immediately found, that
window is updated if:

	if (SEG.SEQ > SND.WL1 ||
	    SEG.ACK > SND.UNA ||
	    SEG.WND > SND.WND)

In other words, it accepts each window expansion not depending on anything.

Are you sure, that you wanted this?

Alexey


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 17:07:07 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA26491
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 17:07:06 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id OAA19083
	for tcp-impl-outgoing; Fri, 30 Jun 2000 14:33:26 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id OAA18607
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 14:28:28 -0400 (EDT)
From: kuznet@ms2.inr.ac.ru
Received: by seraph3.lerc.nasa.gov; id OAA03369; Fri, 30 Jun 2000 14:28:26 -0400
Received: from minus.inr.ac.ru(193.233.7.97) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma003336; Fri, 30 Jun 00 14:28:19 -0400
Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA25813; Fri, 30 Jun 2000 22:28:08 +0400
Message-Id: <200006301828.WAA25813@ms2.inr.ac.ru>
Subject: Re: Send window update algorithm ...
To: Kacheong.Poon@Eng.Sun.COM
Date: Fri, 30 Jun 2000 22:28:08 +0400 (MSK DST)
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: <Roam.SIMC.2.0.6.962325432.19099.kcpoon@jurassic> from "Kacheong Poon" at Jun 29, 0 05:37:12 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

Hello!

> I think your revised check is equivalent to Alexey's check, except the order
> of checking is different.

No, they do not coincide. And Ramesh's version seems to be more clever.

Look:

> OR if ((SND.WL2 == SEG.ACK) && (SEG.WND > SND.WND))

includes SND.WL1==SEG.SEQ in my version. I did this because
otherwise it will result in spurious window reopens by old segments.

It is not that problem, which was fighted. Window update is accepted,
when ACK advances, that's that problem and it is solved in the second line.

I understand, why the check is omitted. It is because of the second
difference. My version allows WL1 to go back, when update occurs from
new ACK. This one prohibit this (the first line), so that check
for SND.WL1==SEG.ACK in the third line becomes too restrictive.

The only doubt is that omitting this check looks like artificial stimulation
of window opening. Following this line, we simply can prohibit
(rather than "strongly discourage" 8)) window shrinking at all,
it will be simpler.

Alexey


From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 17:26:08 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA26873
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 17:26:08 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA23939
	for tcp-impl-outgoing; Fri, 30 Jun 2000 15:05:01 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id PAA23913
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 15:04:59 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA07702; Fri, 30 Jun 2000 15:04:58 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma007668; Fri, 30 Jun 00 15:04:45 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id MAA02067;
	Fri, 30 Jun 2000 12:04:42 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.87.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id MAA24089;
	Fri, 30 Jun 2000 12:04:42 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5UJ4dj642904;
	Fri, 30 Jun 2000 12:04:39 -0700 (PDT)
Date: Fri, 30 Jun 2000 12:04:31 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: tcp-impl@grc.nasa.gov
Cc: RShankar@novell.com
In-Reply-To: "Your message with ID" <395C9DE5.DB28BE7E@Novell.COM>
Message-ID: <Roam.SIMC.2.0.6.962391871.6789.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> What I meant to say was that when receiver is retransmitting, SEG.SEQ
> will be < SND.WL1. Hence, the third check must use SND.WL2 to handle
> pure ACKs having window updates. This was one of the issues that I
> started off the whole discussion with :-)).

Hmm, in the original mail, you mentioned that it was a bi-directional
transfer.  So the retransmission should better ack some data.  If it
ack's some data, then the window update should be accepted.  It seems
to me that what you are suggesting here is that TCP should accept window
updates from old segment without new data being acked.  I don't think
it is a good idea.  How can you know for sure that the segment you get
is really a retransmission or just the old original one, if the ack
stays the same.  Do you really mean that?

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 17:31:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA27010
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 17:31:29 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA25170
	for tcp-impl-outgoing; Fri, 30 Jun 2000 15:14:11 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id PAA25151
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 15:14:09 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA08752; Fri, 30 Jun 2000 15:14:08 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma008713; Fri, 30 Jun 00 15:13:48 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id MAA05554;
	Fri, 30 Jun 2000 12:13:44 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.87.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id MAA26191;
	Fri, 30 Jun 2000 12:13:44 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5UJDhj644337;
	Fri, 30 Jun 2000 12:13:43 -0700 (PDT)
Date: Fri, 30 Jun 2000 12:13:35 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: kuznet@ms2.inr.ac.ru
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <200006301828.WAA25813@ms2.inr.ac.ru>
Message-ID: <Roam.SIMC.2.0.6.962392415.18491.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Look:
> 
> > OR if ((SND.WL2 == SEG.ACK) && (SEG.WND > SND.WND))
> 
> includes SND.WL1==SEG.SEQ in my version. I did this because
> otherwise it will result in spurious window reopens by old segments.

Yup, I missed this line.  But I think we should include SND.WL1==SEG.SEQ.
I just replied to Ramesh asking whether he really means that he wants TCP
to accept window update from "old" segment which does not ack new data.  I
think the problem is that TCP cannot know whether the newly rceived "old"
segment is really a retransmission or the original old segment if ack stays
the same.

> I understand, why the check is omitted. It is because of the second
> difference. My version allows WL1 to go back, when update occurs from
> new ACK. This one prohibit this (the first line), so that check
> for SND.WL1==SEG.ACK in the third line becomes too restrictive.

I think it is too "liberal" instead of too "restrictive."  It just means that
whenever a segment comes in with a larger window, TCP will accept the update. 
I don't think this is correct.

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 17:35:52 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA27062
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 17:35:52 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id PAA25549
	for tcp-impl-outgoing; Fri, 30 Jun 2000 15:17:13 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id PAA25524
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 15:17:10 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id PAA09027; Fri, 30 Jun 2000 15:17:09 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma008986; Fri, 30 Jun 00 15:16:39 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id MAA06598;
	Fri, 30 Jun 2000 12:16:35 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.86.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id MAA26924;
	Fri, 30 Jun 2000 12:16:35 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5UJGYj644669;
	Fri, 30 Jun 2000 12:16:34 -0700 (PDT)
Date: Fri, 30 Jun 2000 12:16:26 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: kuznet@ms2.inr.ac.ru
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <200006301845.WAA25907@ms2.inr.ac.ru>
Message-ID: <Roam.SIMC.2.0.6.962392586.5100.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Hey, stop! I tried to code this and immediately found, that
> window is updated if:
> 
> 	if (SEG.SEQ > SND.WL1 ||
> 	    SEG.ACK > SND.UNA ||
> 	    SEG.WND > SND.WND)
> 
> In other words, it accepts each window expansion not depending on anything.

Interesting!  So one really needs to write the code to really know what
one is thinking C:  It is not obvious to me from reading those checks that
they can be reduced to the above!

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 18:48:51 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA27988
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 18:48:51 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id QAA05789
	for tcp-impl-outgoing; Fri, 30 Jun 2000 16:35:25 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id QAA05771
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 16:35:22 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id QAA18223; Fri, 30 Jun 2000 16:35:21 -0400
Received: from cpl-mail1.cpl.novell.com(147.2.71.20) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma018184; Fri, 30 Jun 00 16:35:13 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by cpl-mail1.cpl.novell.com; Fri, 30 Jun 2000 22:34:33 +0200
Message-ID: <395D0456.CAF0613A@Novell.COM>
Date: Fri, 30 Jun 2000 14:34:31 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: kuznet@ms2.inr.ac.ru
CC: tcp-impl@grc.nasa.gov
Subject: Re: Send window update algorithm ...
References: <200006301845.WAA25907@ms2.inr.ac.ru>
Content-Type: multipart/mixed;
 boundary="------------47BCF4E9EF6ED6ED270CB253"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------47BCF4E9EF6ED6ED270CB253
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

The original algorithm accepts window updates if:

SEG.SEQ > SND.WL1

OR if ((SEG.SEQ == SND.WL1) AND (SEG.ACK > SND.WL2))

OR if ((SEG.ACK == SND.WL2) AND (SEG.WND > SND.WND))

All that we have really changed is check # 2, which doesn't handle the
case of retransmits involving bidirectional data transfer, as reproduced
here again below:

OR if (SEG.ACK >SND.WL2)

Unless someone explains the reason behind not using window information
sent by receiver during retransmits, the above modification would
correctly handle things.

So, window information from a received ACK would be used: [Assuming
trimming has been already done and basic ACK validation has already been
done].

- If the segment has new data
- If the segment ACKs new data
- If this is a window update [announcing a bigger window] that is not
out of order.

Let me know whether there are any issues.

Thanks,

S.R.

kuznet@ms2.inr.ac.ru wrote:
> 
> Hello!
> 
> > Set SND.WND = SEG.WND: [Assuming Window scaling is correctly handled].
> > if (SEG.SEQ > SND.WL1) [Set SND.WL1 = SEG.SEQ]
> > OR if (SEG.ACK > SND.WL2) [Set SND.WL2 = SEG.WL2]
> > OR if ((SND.WL2 == SEG.ACK) && (SEG.WND > SND.WND))
> 
> Hey, stop! I tried to code this and immediately found, that
> window is updated if:
> 
>         if (SEG.SEQ > SND.WL1 ||
>             SEG.ACK > SND.UNA ||
>             SEG.WND > SND.WND)
> 
> In other words, it accepts each window expansion not depending on anything.
> 
> Are you sure, that you wanted this?
> 
> Alexey
--------------47BCF4E9EF6ED6ED270CB253
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------47BCF4E9EF6ED6ED270CB253--



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 19:04:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA28224
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 19:04:29 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id QAA07936
	for tcp-impl-outgoing; Fri, 30 Jun 2000 16:55:42 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id QAA07900
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 16:55:38 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id QAA20522; Fri, 30 Jun 2000 16:55:37 -0400
Received: from cpl-mail1.cpl.novell.com(147.2.71.20) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma020493; Fri, 30 Jun 00 16:55:09 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by cpl-mail1.cpl.novell.com; Fri, 30 Jun 2000 22:54:55 +0200
Message-ID: <395D091C.B4068F76@Novell.COM>
Date: Fri, 30 Jun 2000 14:54:52 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>,
        TCP-IMPL <tcp-impl@grc.nasa.gov>
CC: Ramesh Shankar <RShankar@novell.com>
Subject: Re: Send window update algorithm ...
References: <Roam.SIMC.2.0.6.962391871.6789.kcpoon@jurassic>
Content-Type: multipart/mixed;
 boundary="------------D299883CD477A671CC088223"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------D299883CD477A671CC088223
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Isn't it possible for a receiver who is retransmitting to us to send
window updates (when the application at the receiving side picked up new
data)? Also, when the receiver is retransmitting, not every segment
needs to ACK new data (perhaps we stopped temporarily, perhaps our data
didn't make it to the receiver), even if it were bidirectional. 

Here is a solid example. Bidirectional data transfer, receiver
advertised zero window, as we filled receiver's window, receiver timed
out and retransmitted (SEG.SEQ < SND.WL1) and then when the application
at the receiver picked up the data, it sent a window update, with
SEG.ACK = SND.WL2. Note that we will be doing window probes (and getting
0 window responses until application at receiver picks up enough data).
Remember that SEG.SEQ for this window update would be less than SND.WL1.
Hence, if we ignore this window update (or use the 3rd check mentioned
by Alexy), until the receiver retransmits all lost data and its SEG.SEQ
crosses SND.WL1, we won't ever be able to send anything.

I don't know what you mean by "old" segments. The basic check of SND.UNA
<= SEG.ACK <= SND.MAX must have been already made when we do the send
window update check. Hence, If the receiver sent an ACK which got
delayed, and during retransmit sent another plain ACK (with a higher ACK
#), and we updated SND.WL2 and the ACK which got delayed is received,
then we would not be accepting that old ack as SEG.ACK < SND.UNA (and
essentially SND.WL2).

Am I missing something very obvious?

Thanks,

S.R.

Kacheong Poon wrote:
> 
> > What I meant to say was that when receiver is retransmitting, SEG.SEQ
> > will be < SND.WL1. Hence, the third check must use SND.WL2 to handle
> > pure ACKs having window updates. This was one of the issues that I
> > started off the whole discussion with :-)).
> 
> Hmm, in the original mail, you mentioned that it was a bi-directional
> transfer.  So the retransmission should better ack some data.  If it
> ack's some data, then the window update should be accepted.  It seems
> to me that what you are suggesting here is that TCP should accept window
> updates from old segment without new data being acked.  I don't think
> it is a good idea.  How can you know for sure that the segment you get
> is really a retransmission or just the old original one, if the ack
> stays the same.  Do you really mean that?
> 
>                                                         K. Poon.
>                                                         kcpoon@eng.sun.com
--------------D299883CD477A671CC088223
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------D299883CD477A671CC088223--



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 20:18:23 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA29106
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 20:18:23 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA13038
	for tcp-impl-outgoing; Fri, 30 Jun 2000 18:00:19 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id SAA13017
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 18:00:17 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id SAA27135; Fri, 30 Jun 2000 18:00:15 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma027057; Fri, 30 Jun 00 17:59:46 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id OAA03915;
	Fri, 30 Jun 2000 14:59:44 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.85.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id OAA03433;
	Fri, 30 Jun 2000 14:59:42 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5ULxdj672329;
	Fri, 30 Jun 2000 14:59:40 -0700 (PDT)
Date: Fri, 30 Jun 2000 14:59:29 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: tcp-impl@grc.nasa.gov
Cc: RShankar@novell.com
In-Reply-To: "Your message with ID" <395D091C.B4068F76@Novell.COM>
Message-ID: <Roam.SIMC.2.0.6.962402369.23143.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Isn't it possible for a receiver who is retransmitting to us to send
> window updates (when the application at the receiving side picked up new
> data)? Also, when the receiver is retransmitting, not every segment
> needs to ACK new data (perhaps we stopped temporarily, perhaps our data
> didn't make it to the receiver), even if it were bidirectional. 

What I was trying to say is that without ack'ing new data, the receiver
of the segment does not know whether it is the original segment A or a
retransmission of segment A.  That's what I was asking.  I am not saying
that a retransmitted segment cannot have a window update.  If A is somehow
delayed but received and the retransmitted segment is lost, shall TCP use
the window update info in the original segment A?  I think this case can
confuse TCP, thus we may not want the window update.  This is the question
I tried to ask you.  Your answer can be yes, TCP should take it.

> Here is a solid example. Bidirectional data transfer, receiver
> advertised zero window, as we filled receiver's window, receiver timed
> out and retransmitted (SEG.SEQ < SND.WL1) and then when the application
> at the receiver picked up the data, it sent a window update, with
> SEG.ACK = SND.WL2. Note that we will be doing window probes (and getting
> 0 window responses until application at receiver picks up enough data).
> Remember that SEG.SEQ for this window update would be less than SND.WL1.
> Hence, if we ignore this window update (or use the 3rd check mentioned
> by Alexy), until the receiver retransmits all lost data and its SEG.SEQ
> crosses SND.WL1, we won't ever be able to send anything.

The above assertion is not correct.  Since the sender is sending window
probe, if the receiver can open up the window, it also means that it will
take that 1 byte window probe.  This means that it will ack that 1 byte.  So
Alexey's check will take the window update in the retransmitted segment
since it ack's new data.

> I don't know what you mean by "old" segments. The basic check of SND.UNA
> <= SEG.ACK <= SND.MAX must have been already made when we do the send
> window update check. Hence, If the receiver sent an ACK which got
> delayed, and during retransmit sent another plain ACK (with a higher ACK
> #), and we updated SND.WL2 and the ACK which got delayed is received,
> then we would not be accepting that old ack as SEG.ACK < SND.UNA (and
> essentially SND.WL2).

The higher ack # assertion is not correct.  I think you've pointed that out
in your first paragraph above.  Not every segments, even retransmitted
segments, needs to ack new data.  It really depends on the traffic pattern.
So TCP cannot really distinguish between the original "old" segment and a
retransmitted segment.  This is the question I tried to ask...

> The original algorithm accepts window updates if:
> 
> SEG.SEQ > SND.WL1
> 
> OR if ((SEG.SEQ == SND.WL1) AND (SEG.ACK > SND.WL2))
> 
> OR if ((SEG.ACK == SND.WL2) AND (SEG.WND > SND.WND))

I don't think the above is the original BSD checks.  Where did you get it
from?  The BSD check (from actual freebsd code) is

            (SEQ_LT(tp->snd_wl1, ti->ti_seq) ||
            (tp->snd_wl1 == ti->ti_seq && (SEQ_LT(tp->snd_wl2, ti->ti_ack) ||
             (tp->snd_wl2 == ti->ti_ack && tiwin > tp->snd_wnd))))

Note that the third check is 

	SND.WL1 == SEG.SEQ && SND.WL2 == SEG.ACK && SEG.WND > SND.WND

This is what Alexey pointed out is the difference I missed in your revised
suggestion and Alexey's suggestion.  I originally thought that they were the
same.  But actually, I missed that fact that you missed the above SND.WL1 ==
SEG.SEQ check.  I guess you also missed this difference C:

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 20:27:30 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA29157
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 20:27:29 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA14459
	for tcp-impl-outgoing; Fri, 30 Jun 2000 18:23:33 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id SAA14438
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 18:23:31 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id SAA29290; Fri, 30 Jun 2000 18:23:30 -0400
Received: from cpl-mail1.cpl.novell.com(147.2.71.20) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma029245; Fri, 30 Jun 00 18:22:56 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by cpl-mail1.cpl.novell.com; Sat, 01 Jul 2000 00:22:21 +0200
Message-ID: <395D1D9C.3E5643AB@Novell.COM>
Date: Fri, 30 Jun 2000 16:22:20 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
CC: tcp-impl@grc.nasa.gov
Subject: Re: Send window update algorithm ...
References: <Roam.SIMC.2.0.6.962402369.23143.kcpoon@jurassic>
Content-Type: multipart/mixed;
 boundary="------------3465264FEBFE347CDBDF064D"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------3465264FEBFE347CDBDF064D
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



Kacheong Poon wrote:

> 
> The above assertion is not correct.  Since the sender is sending window
> probe, if the receiver can open up the window, it also means that it will
> take that 1 byte window probe.  This means that it will ack that 1 byte.  So
> Alexey's check will take the window update in the retransmitted segment
> since it ack's new data.

I knew you would say this :-)). If you were doing "keep alive" style
window probes, you would run into the problem that I mentioned. I
personally don't like the 1-byte window probe scheme.

Thanks,

S.R.
--------------3465264FEBFE347CDBDF064D
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------3465264FEBFE347CDBDF064D--



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 20:40:06 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA29318
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 20:40:05 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA15171
	for tcp-impl-outgoing; Fri, 30 Jun 2000 18:37:38 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id SAA15153
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 18:37:36 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id SAA00488; Fri, 30 Jun 2000 18:37:34 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma000438; Fri, 30 Jun 00 18:36:36 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id PAA15272;
	Fri, 30 Jun 2000 15:36:34 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.84.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id PAA12006;
	Fri, 30 Jun 2000 15:36:34 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5UMaRj682820;
	Fri, 30 Jun 2000 15:36:28 -0700 (PDT)
Date: Fri, 30 Jun 2000 15:36:17 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: tcp-impl@grc.nasa.gov
Cc: RShankar@novell.com
In-Reply-To: "Your message with ID" <395D1D9C.3E5643AB@Novell.COM>
Message-ID: <Roam.SIMC.2.0.6.962404577.10070.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> I knew you would say this :-)). If you were doing "keep alive" style
> window probes, you would run into the problem that I mentioned. I
> personally don't like the 1-byte window probe scheme.

What do you mean by "keep alive" style window probe?  Do you mean probing
zero window every 2 hours?  And what is your objection to 1 byte zero
window probe?

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 20:49:41 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA29395
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 20:49:40 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA15577
	for tcp-impl-outgoing; Fri, 30 Jun 2000 18:46:37 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id SAA15557
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 18:46:35 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id SAA01346; Fri, 30 Jun 2000 18:46:34 -0400
Received: from cpl-mail1.cpl.novell.com(147.2.71.20) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma001324; Fri, 30 Jun 00 18:46:04 -0400
Received: from Novell.COM
	(hema.dnsdhcp.provo.novell.com [137.65.57.9])
	by cpl-mail1.cpl.novell.com; Sat, 01 Jul 2000 00:45:32 +0200
Message-ID: <395D230A.CCF630E5@Novell.COM>
Date: Fri, 30 Jun 2000 16:45:30 -0600
From: Ramesh Shankar <RShankar@novell.com>
X-Mailer: Mozilla 4.73 [en] (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
CC: tcp-impl@grc.nasa.gov
Subject: Re: Send window update algorithm ...
References: <Roam.SIMC.2.0.6.962402369.23143.kcpoon@jurassic>
Content-Type: multipart/mixed;
 boundary="------------3875A60B45A31A282F123F9F"
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

This is a multi-part message in MIME format.
--------------3875A60B45A31A282F123F9F
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Kacheong Poon wrote:
> 
> > Isn't it possible for a receiver who is retransmitting to us to send
> > window updates (when the application at the receiving side picked up new
> > data)? Also, when the receiver is retransmitting, not every segment
> > needs to ACK new data (perhaps we stopped temporarily, perhaps our data
> > didn't make it to the receiver), even if it were bidirectional.
> 

Okay, I thought about this one. Let me know whether this looks correct: 

Assume that segment A is the original one sent by the receiver to us
(sender) and it got delayed. Now receiver is retransmitting and it so
happens that old (delayed) segment A comes to us. Here are the cases:

- New A has a newer ACK (i.e. SEG.ACK > SND.WL2). We don't care what
window this segment has. We blindly use it. (The window could have very
well been shrunk by the receiver).

- New A has same ACK (i.e. SEG.ACK == SND.WL2). In this case, we have
two cases:
	* New A's SEG.WND < old A's SEG.WND
If we were to accept old A first and old A had more window, we won't be
accepting New A's ACK information. However, note that this is an illegal
case anyway as with same ACK#, the receiver has shrunk the window. This
won't be accepted by the existing code. (Only "legal" way to shink the
window is to send new data or ACK new data). I hope this one is right!

	* New A's SEG.WND > old A's SEG.WND
This is perfectly okay, this is a window update and even if we were to
accept the old A first and update the window (Note that we would have
done this only if old A's window had SEG.WND > SND.WND), we will use the
new A's window information and update the window again.

- Old A has an older ACK. No problem here. Won't accept the ACK anyway.

I think all the cases have been covered.

>> The original algorithm accepts window updates if:
>> 
>> SEG.SEQ > SND.WL1
>> 
>> OR if ((SEG.SEQ == SND.WL1) AND (SEG.ACK > SND.WL2))
>> 
>> OR if ((SEG.ACK == SND.WL2) AND (SEG.WND > SND.WND))

>I don't think the above is the original BSD checks.  Where did you get it
>from?  The BSD check (from actual freebsd code) is
>        SND.WL1 == SEG.SEQ && SND.WL2 == SEG.ACK && SEG.WND > SND.WND

I guess you are right. NetBSD 1.41 has the code that I quoted. FreeBSD
3.31 has the one that you quoted. Alexy also pointed this out. I missed
this.

BTW, I am still hoping that someone would enlighten me with some kind of
reasoning on why the standard BSD code did not use window information
contained in retransmissions. i.e. when SEG.SEQ < SND.WL1. I can see
only two cases where this can happen:

* Retransmission
* Out of order segment.

Out of order segments are not the issue. I wonder whether there is some
kind of really messy case in retransmission which prompted the standard
BSD scheme. Or may be I am paranoid :-)).

Thanks,

S.R.
--------------3875A60B45A31A282F123F9F
Content-Type: text/x-vcard; charset=us-ascii;
 name="RShankar.vcf"
Content-Description: Card for Ramesh Shankar
Content-Disposition: attachment;
 filename="RShankar.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:Shankar;Ramesh
x-mozilla-html:FALSE
org:Novell Inc.
version:2.1
email;internet:RShankar@Novell.COM
title:Sr. Software Engineer
adr;quoted-printable:;;MS: PRV-H-311=0D=0A1800 South Novell Place;Provo;UT;84606;USA
fn:Ramesh Shankar
end:vcard

--------------3875A60B45A31A282F123F9F--



From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 20:55:29 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA29433
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 20:55:29 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id SAA15605
	for tcp-impl-outgoing; Fri, 30 Jun 2000 18:46:50 -0400 (EDT)
Received: from seraph2.lerc.nasa.gov (firewall-user@guardian02.lerc.nasa.gov [139.88.146.11])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id SAA15601
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 18:46:48 -0400 (EDT)
Received: by seraph2.lerc.nasa.gov; id SAA25845; Fri, 30 Jun 2000 18:46:48 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph2.lerc.nasa.gov via smap (V5.5)
	id xma025524; Fri, 30 Jun 00 18:45:49 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id PAA17853;
	Fri, 30 Jun 2000 15:45:43 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.84.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id PAA13985;
	Fri, 30 Jun 2000 15:45:43 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5UMjfj684596;
	Fri, 30 Jun 2000 15:45:41 -0700 (PDT)
Date: Fri, 30 Jun 2000 15:45:31 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: Ramesh Shankar <RShankar@novell.com>
Cc: tcp-impl@grc.nasa.gov
In-Reply-To: "Your message with ID" <395D1D9C.3E5643AB@Novell.COM>
Message-ID: <Roam.SIMC.2.0.6.962405131.17962.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> I knew you would say this :-)). If you were doing "keep alive" style
> window probes, you would run into the problem that I mentioned. I
> personally don't like the 1-byte window probe scheme.

On a second thought, I guess by keep alive style, you mean sending a fake
byte or use old sequence number to send the zero window probe.  Do I guess
correctly?

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@lerc.nasa.gov  Fri Jun 30 21:30:56 2000
Received: from lombok-fi.lerc.nasa.gov (lombok-fi.lerc.nasa.gov [139.88.112.33])
	by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA29881
	for <tcpimpl-archive@odin.ietf.org>; Fri, 30 Jun 2000 21:30:56 -0400 (EDT)
Received: (from listserv@localhost)
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) id TAA17421
	for tcp-impl-outgoing; Fri, 30 Jun 2000 19:20:42 -0400 (EDT)
Received: from seraph3.lerc.nasa.gov (firewall-user@guardian03.lerc.nasa.gov [139.88.146.12])
	by lombok-fi.lerc.nasa.gov (NASA LeRC 8.9.1.1/8.9.1) with SMTP id TAA17403
	for <tcp-impl@grc.nasa.gov>; Fri, 30 Jun 2000 19:20:41 -0400 (EDT)
Received: by seraph3.lerc.nasa.gov; id TAA04370; Fri, 30 Jun 2000 19:20:41 -0400
Received: from mercury.sun.com(192.9.25.1) by seraph3.lerc.nasa.gov via smap (V5.5)
	id xma004299; Fri, 30 Jun 00 19:20:00 -0400
Received: from sunmail1.Sun.COM ([129.145.1.2])
	by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id QAA26428;
	Fri, 30 Jun 2000 16:19:59 -0700 (PDT)
Received: from jurassic.eng.sun.com (jurassic.Eng.Sun.COM [129.146.88.31])
	by sunmail1.Sun.COM (8.9.1b+Sun/8.9.1/ENSMAIL,v1.6.1-sunmail1) with ESMTP id QAA21465;
	Fri, 30 Jun 2000 16:19:59 -0700 (PDT)
Received: from shield (shield.Eng.Sun.COM [129.146.85.114])
	by jurassic.eng.sun.com (8.10.2+Sun/8.10.2) with SMTP id e5UNJvj690679;
	Fri, 30 Jun 2000 16:19:57 -0700 (PDT)
Date: Fri, 30 Jun 2000 16:19:47 -0700 (PDT)
From: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Reply-To: Kacheong Poon <Kacheong.Poon@Eng.Sun.COM>
Subject: Re: Send window update algorithm ...
To: tcp-impl@grc.nasa.gov
Cc: RShankar@novell.com
In-Reply-To: "Your message with ID" <395D230A.CCF630E5@Novell.COM>
Message-ID: <Roam.SIMC.2.0.6.962407187.19083.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@lerc.nasa.gov
Precedence: bulk

> Assume that segment A is the original one sent by the receiver to us
> (sender) and it got delayed. Now receiver is retransmitting and it so
> happens that old (delayed) segment A comes to us. Here are the cases:
> 
> - New A has a newer ACK (i.e. SEG.ACK > SND.WL2). We don't care what
> window this segment has. We blindly use it. (The window could have very
> well been shrunk by the receiver).

Actually, I think TCP will not take new A (A*)'s window update, even with the
proposed change.  The reason is that A* is a duplicate data segment.  It is
different from your original case when the retransmitted segment fills in a
hole.  So if A is received first and then A* arrives, A* will be dropped.

> - New A has same ACK (i.e. SEG.ACK == SND.WL2). In this case, we have
> two cases:
> 	* New A's SEG.WND < old A's SEG.WND
> If we were to accept old A first and old A had more window, we won't be
> accepting New A's ACK information. However, note that this is an illegal
> case anyway as with same ACK#, the receiver has shrunk the window. This
> won't be accepted by the existing code. (Only "legal" way to shink the
> window is to send new data or ACK new data). I hope this one is right!

This looks correct.

> 	* New A's SEG.WND > old A's SEG.WND
> This is perfectly okay, this is a window update and even if we were to
> accept the old A first and update the window (Note that we would have
> done this only if old A's window had SEG.WND > SND.WND), we will use the
> new A's window information and update the window again.

Again, TCP will not use A*'s window update because it is a duplicate data
segment.

> - Old A has an older ACK. No problem here. Won't accept the ACK anyway.
> 
> I think all the cases have been covered.

So I think the above cases can tell us that it can be confusing on which
update TCP should take or should not take.  Well, to avoid this confusion,
how about not taking any at all C:

> Out of order segments are not the issue. I wonder whether there is some
> kind of really messy case in retransmission which prompted the standard
> BSD scheme. Or may be I am paranoid :-)).

That's why I suggested Alexey to send the proposed change to end2end.  They
have old timers there and they can tell us if the proposed change misses
some cases.

							K. Poon.
							kcpoon@eng.sun.com





