AWS re:Invent 2022 - Deliver great experiences with QUIC on Amazon CloudFront (NET401)
AWS re:Invent 2022 - Deliver great experiences with QUIC on Amazon CloudFront (NET401)
In this session, Jim Roskind, VP and Distinguished Engineer at Amazon and best known for designing the QUIC protocol, discusses how Amazon CloudFront supports QUIC and helps customers improve performance and the end user experience by reducing connection times. Learn about improvements offered by QUIC, while sending and receiving content, especially in networks with lossy signals and intermittent connectivity. Also, Snap will talk about their journey with AWS and their use of QUIC protocol. Snap, is the company behind Snapchat and Bitmoji, and they build on AWS to create innovative experiences for hundreds of millions of users.
ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.
AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
#reInvent2022 #AWSreInvent2022 #AWSEvents
Content
0.54 -> - Thanks a lot for coming.
2.37 -> My name's Jim Roskind
and I'm gonna be speaking
4.11 -> with Mahmoud Ragab about
delivering great experiences
7.98 -> with QUIC on Amazon CloudFront.
10.14 -> If you came looking for
technical background,
12.3 -> like how did this QUIC stuff come to be?
14.31 -> What were the trade-offs?
15.21 -> Why did they decide what to do?
17.46 -> This is a great place to be.
19.35 -> And if you didn't, this
is a good time to leave.
21.54 -> There'll be some geek stuff
and I'm gonna talk really fast
23.49 -> 'cause I have a lot of content.
25.02 -> So be ready.
26.88 -> First the question.
28.62 -> Ah, there we go.
29.453 -> Why is Jim Roskind talking about QUIC?
32.1 -> The answer is I actually
architected design
33.78 -> and led the development of QUIC.
35.67 -> QUIC stands for Quick
UDP Internet Connections
38.16 -> and it evolved into
IETF's HTTP/3 standard,
41.94 -> which came out fairly recently.
44.58 -> If you wonder where I got the background
46.32 -> to do some of this stuff.
47.153 -> In '95 I was working at Netscape
48.75 -> on browser and server security.
50.82 -> I helped design SSL
2.0, which was TLS 1.0.
54.69 -> Designed SYN Java.
56.34 -> Was Netscape's Java security architect.
59.4 -> My joke is I used to fix
bugs that were reported
61.77 -> to the New York Times and
the Wall Street Journal,
63.24 -> but that's a different story.
64.74 -> In 2008 I worked for Google
making Chrome go faster.
68.76 -> I proposed, architected,
designed metrics infrastructure
73.531 -> for Chrome, which is
really essential point
75.51 -> in the design of this entire protocol.
77.4 -> This protocol is different
from many others because
79.2 -> it was developed out
there on the internet,
81.24 -> understanding what the
internet does to packets.
83.61 -> And I also implemented DNS pre-resolution
85.65 -> and TCP pre-connection.
87.12 -> 2016 I joined Amazon as a
VP/distinguished engineer.
92.43 -> So QUIC I told you it stands
93.9 -> for Quick UDP Internet connections.
96.24 -> The idea is it's a protocol intending
97.83 -> to supplant HTTP/2.
100.304 -> HTTP.
101.16 -> It's gotta be a tongue twister for me
102.24 -> the whole time and more.
104.46 -> It provides cryptographic
privacy and tamper resistance.
107.19 -> Historically comparable
to TLS but now evolved
109.71 -> to actually use full-blown TLS.
111.45 -> We multiplex requests much
like SPDY or HTTP/2 does
115.08 -> to put everything down a single line,
116.43 -> which helps with congestion control
118.92 -> and also improves latency
and reduces the variance
121.14 -> between the multiple requests.
122.79 -> Again, I'm talking...
123.623 -> I'll talk more about this during the talk
125.46 -> and finally it sequenced UDP packets.
128.07 -> So we changed the idea of UDP
being a user datagram protocol
131.52 -> to being a sequence of UDP packets.
134.46 -> But why do we need or want to use it?
136.98 -> Well QUIC is all about speed.
138.69 -> It's all about user latency.
140.76 -> We want faster and more
reliable connections
143.52 -> and fewer round trips.
144.72 -> Round trips are gonna be a big point
146.475 -> and we'll talk about it on the next slide.
147.75 -> We wanna reduce latency and
variance and delivery of bytes
150.87 -> and we wanted better web
performance in congested networks,
153.96 -> which was one of the questions
someone before this talk
156.24 -> started to ask me about.
157.5 -> And Amazon CloudFront supports HTTP/3.
160.29 -> It's available worldwide
with full TLS security
163.2 -> and you should enable it.
166.5 -> The overview of the talk.
167.52 -> I'm gonna talk about the context
168.81 -> and justification for creating this,
170.28 -> including the background and history
172.2 -> of HTTP and latency and bandwidth.
174.12 -> Talk about the problems of
the different versions of HTTP
177.21 -> as well as solutions.
178.29 -> And then we'll talk
179.123 -> about a Snapchat deployment
using Amazon CloudFront.
182.88 -> Finally, we can get to Q and A.
184.5 -> Assuming I talk really fast,
186.15 -> which unfortunately you may see.
188.31 -> Context for developing a new protocol.
189.78 -> The first thing you have to
be is very customer-obsessed.
191.94 -> Measurement focused.
193.29 -> By the way, that's a leadership principle
194.64 -> at Amazon is focus on the customer,
196.83 -> obsess over the customer.
198.3 -> Don't keep the customer
waiting, reduce the latency.
200.46 -> And then to design a protocol,
201.72 -> you really have to stand
202.59 -> on the shoulders of giants.
204.27 -> Use the expertise.
205.62 -> This is the protocol giants
expertise for support of TLS.
209.187 -> TLS was really hard to
design and debug and develop.
212.04 -> You have to use as much as you
can of this brilliant work.
214.68 -> Recent deployments of SPDY
HTTP/2 involve multiplexing.
217.86 -> Again, tremendous forward progress.
219.9 -> Use those ideas.
221.19 -> Finally use customer metrics
on the infrastructure
223.44 -> to be sure you're going
in the right direction.
224.91 -> Constantly checking.
226.05 -> And lastly, heavily depend on luck
228.21 -> 'cause if skill ever lets you down,
229.35 -> luck is your answer.
230.67 -> Okay, so we get to details
of preexisting problems.
234.15 -> While it all starts with
the elephant in the room,
236.37 -> RTT, Round Trip Time
238.56 -> and it could be up around 400
milliseconds from some points
241.53 -> of presence to actual clients.
243.48 -> And that's almost half a second.
244.65 -> Now it's not just the speed of light.
245.88 -> Speed of light is something,
246.713 -> but the US cross country is
only about 20 milliseconds.
249.81 -> But the truth is a packet going
between routers traversin'
253.14 -> the country actually takes closer
254.85 -> to 60 to 100 milliseconds.
257.85 -> So you have to realize there's
significant latency even here
260.61 -> in the United States,
let alone in, as I say,
263.34 -> I think it's India and the
Ukraine and maybe Russia
266.22 -> where I used to see
extremely large latencies.
268.8 -> If you wanna target delivering
your content to people
270.99 -> in under 200 milliseconds,
272.16 -> you have to go for the best standards.
273.63 -> And that means you can't afford
many of these round trips
276.75 -> and round trips come in very
surprising, interesting places.
279.99 -> The first thing to realize
is historically HTTP/1.0,
283.83 -> the request uses a fresh TCP
connection every single time.
287.58 -> That means that these HTTP
requests are very expensive.
291.9 -> And an example, I can't give
the exact name of the site,
295.65 -> the names have been changed
to protect the innocent,
298.05 -> but this site used 150 http
requests on their homepage.
303.3 -> It was not uncommon.
305.01 -> All right, that's kind of a problem.
306.99 -> So let's look at why
connections are a little bit...
310.26 -> Are we gonna change the microphone
311.61 -> or you gonna throw me off the stage?
313.11 -> - You are one slide ahead,
so you want to move.
315.84 -> - Wow, I should look at that
one instead of this one.
319.26 -> Very tricky.
320.28 -> Have you all been reading
the slide before or after?
323.4 -> Oh crap.
324.851 -> (audience laughs)
326.76 -> I was just testing to see if
anyone was listening to me.
328.98 -> It's good that you were listening.
330.36 -> I'll look at the other screen now.
333.51 -> It's good I have plants in the audience
335.13 -> that set me straight.
336.27 -> All right, so RTT up to four.
337.89 -> Speed of light.
338.91 -> Gosh, that's what I just said.
340.41 -> Ah, yeah, there's a pattern.
342.75 -> Crap.
343.583 -> Okay, TCP connection establishment.
345.39 -> What does it involve?
346.41 -> It starts with a client.
347.4 -> Typically a browser saying SYN that's,
349.567 -> "Hey, can we talk?"
351.06 -> And the answer comes back SYN,
352.297 -> "Yeah, I'm willing to talk to you."
354.24 -> All right and then we send an
ACK and now we're ready to go.
357 -> We've just wasted one
round trip unfortunately.
359.76 -> That's potentially if we were
in India, 400 milliseconds.
362.67 -> 50 milliseconds median
in the United States.
365.19 -> That's a little bit unfortunate.
367.56 -> Then you'd say, "Well why did you wait?
369.397 -> "Why didn't you just say, Hey,
370.477 -> "I wanna talk and start talking?"
372.63 -> Well, the answer is this thing
called a SYN-flood attacks.
374.88 -> SYN-flood attacks historically
are, I'm a bad guy.
377.76 -> I say SYN.
379.38 -> In fact, I say SYN, SYN, SYN, SYN.
381.9 -> And the server on the
other side goes, SYN.
383.85 -> Oh my gosh, you wanna talk?
385.2 -> I better reserve memory get ready.
386.64 -> And I send out an answer.
387.69 -> I've just reserved memory.
388.68 -> Another one, wow, he really is talkative.
391.14 -> Reserve memory, send another one.
392.43 -> Soon I explode.
393.27 -> I ran outta memory.
394.98 -> - flood attack.
395.813 -> That was bad.
396.646 -> Instead they changed it.
397.479 -> They said, "Listen, I am
not gonna answer the phone
400.627 -> "until you respond to this darn SYN-ACK
402.457 -> "and I know you're there."
403.98 -> See, they're not stupid
and so they move forward.
406.98 -> So that was blocking SYN-flood attacks.
408.36 -> Are we out of that?
409.47 -> Not quite 'cause then I go,
411.09 -> I know what I'm gonna do.
411.923 -> Suddenly I'm an attacker here.
413.07 -> SYN, SYN, SYN, SYN, SYN.
414.72 -> By the way, my return address is my moods.
418.546 -> Oh, oh, it's called a blowback attack.
420.06 -> I can convince a server to
start hammering this other site
423.33 -> even though I'm just sending
424.17 -> little itty bitty packets called SYNs.
426.54 -> That's not very good.
427.53 -> And now you already
realize I'm just a TCP.
429.36 -> I haven't even gotten to TLS yet
431.1 -> and already realize servers
can't be very trusting.
434.07 -> Security is already
lurking at the TCP level.
437.31 -> And then we go to TLS.
439.02 -> Remember that was a round trip just
440.16 -> to get permission to talk.
441.18 -> And finally the client
says, "Now that I can talk,
444.097 -> "could we talk SSL please?"
445.86 -> And the service says,
"Sure we can talk SSL."
448.08 -> Here is my public key certificate.
450.48 -> That's called a server HELLO.
452.04 -> And I go, "Wow, what a surprise.
453.277 -> "The same one as you gave me yesterday."
454.89 -> Okay, we'll ignore that fact.
456.3 -> Okay, here's a key exchange.
457.89 -> I propose a key and here's the key.
460.56 -> And the service says,
"Let me add some entropy
462.577 -> "'cause I like to be really secure."
464.04 -> Sends it back to me.
464.97 -> So we're probably wasting
two round trips here.
467.31 -> Remember we spent one round trip in TCP,
469.53 -> two round trips here in SSL land.
471.99 -> I get away with one but
we'll call it three total.
474.84 -> And if we're in India, that's
three times 400 milliseconds.
477.75 -> Quick, who does multiplication?
479.16 -> 1.2.
480.453 -> You got it right.
481.286 -> Okay, 1.2 seconds.
482.119 -> That's a lot.
482.952 -> That's a lot to wait.
484.38 -> So then someone said, "Gee,
what can we do about this?"
487.17 -> I know we'll pipeline.
488.43 -> We'll reuse the connection.
490.32 -> This is called HTTP/1.1.
492.81 -> I'm sorry if you already
know all about this
494.22 -> but I think it's actually
a pretty interesting
496.351 -> and helps you fill up the
background of why we got here
498.51 -> and how we got here.
499.65 -> We tried to reuse the connection.
501.75 -> That's a natural thing.
503.01 -> Unfortunately there are two problems.
504.6 -> The first is in-order response.
506.22 -> Suppose I said, "Send me
a GIF also, by the way,
509.317 -> "look up the search result
and send me another GIF."
512.22 -> Well you'd sorta go, "GIF, sure.
513.667 -> "I'll start pushing that out."
515.61 -> And then the search result,
one minute while I do this.
518.82 -> Unfortunately the the channel goes idle
522 -> and I can't send the other GIF
522.947 -> 'til I get the result of the search
525.63 -> because the rule in HTTP/1.1
is in-order replies.
529.53 -> I make requests they
must come back in order.
532.2 -> Okay, that's a little bit bad.
533.34 -> We have some head-of-line
blocking due to in-order response.
536.19 -> But there was something even worse.
537.45 -> And that is, some people
thought of this differently.
540.06 -> Some people thought the way it would work
541.47 -> is I'd send a request,
543.15 -> he would send me my response,
545.16 -> and then I could clear the buffer,
547.02 -> and now I'd send another one.
548.64 -> Other people realize
it would be really cool
550.35 -> to say, "Request, request, request,"
552.45 -> and get three responses back eventually.
555.36 -> So some of those servers
deleted the input buffer, gulp.
560.43 -> Okay?
561.263 -> Some of them didn't.
562.26 -> The way you could find out is
if your site stopped working
565.11 -> then you knew you were in trouble.
566.31 -> And what most people decided
is to not use HTTP/1.1.
569.07 -> The one time you could use it
570.96 -> was when you're in a data center
572.37 -> and you controlled both ends.
573.66 -> In general, it was too
dangerous to use effectively.
576.78 -> The in-order replies also are a problem.
579.48 -> As I say, even if you tried to use it,
582.03 -> you didn't really like the fact
583.5 -> that it got hung up on a slow response.
586.89 -> Well now that HTTP/1.1 didn't really solve
590.55 -> all of the problems of the world,
592.11 -> people tried to start working around it.
594.42 -> The first and obvious thing is,
595.74 -> why don't I just open
several parallel connections?
598.47 -> Why do I wait for this thing to be usable?
600.84 -> So I'll open a lot.
603.03 -> But unfortunately, I
think I mentioned before,
604.89 -> that one site requested 150 resources.
608.01 -> Now a lot of servers aren't really ready
609.69 -> for 150 simultaneous
requests from one user.
613.02 -> See, that was behind the back one, okay?
614.82 -> They weren't ready for
a request from one user
616.95 -> and so they did little
negotiations back and forth.
619.38 -> And so browsers agreed,
620.37 -> listen, we won't send
more than six at a time.
623.67 -> Okay, that calmed people
down, but still we have six.
625.98 -> That's still quite a few.
628.37 -> And then the servers that
really were big places,
631.32 -> they said, "I really wanna do it
632.587 -> "and I was willing to pay the money
633.547 -> "and buy the bandwidth
and buy the servers.
636.367 -> "I'll have www.example.com.
638.647 -> "I'll have images at example.com.
640.177 -> "I'll have news at example.com.
642.847 -> "Videos at example.
643.687 -> "I'll have a multitude of domains."
645.48 -> And in fact, that's what
this interesting site
647.04 -> did with 150 resources.
648.39 -> They actually got 17 distinct
domains to serve things.
652.59 -> Whew, that was a lot.
653.52 -> By the way, if you do
the fast multiplication
654.87 -> of six times 17,
655.88 -> do you know what that is?
657.51 -> 102.
658.41 -> So 102 isn't the full 150.
659.78 -> So they still had some
things that were reusing
662.52 -> the connections but still they got
663.96 -> a lot of parallel bandwidth.
665.46 -> Unfortunately all those
connections were sharing
668.61 -> the same physical connection
and causing congestion.
671.73 -> They're all fighting with each other.
673.41 -> And you never know which one
is going to lose a packet
676.08 -> and which one is gonna slow down.
677.85 -> And if you're unlucky,
678.683 -> it's gonna be the JavaScript
that you dearly needed.
681.39 -> This was undesirable, but that's not all.
685.92 -> It turns out when you
start these connections,
687.93 -> the truth is historically
the good old days
690.51 -> you got to send the two packets.
692.4 -> If you got an ACK back from that,
693.87 -> you would send four packets.
695.4 -> If you got ACKs back from
those, you would go to eight.
697.38 -> This is called slow start.
698.76 -> Why they call it exponential
growth slow start
700.74 -> is a different issue.
701.7 -> But that's what they called it.
703.249 -> And this is slow start at the
start of a TCP connection.
706.11 -> And unfortunately that
meant that if I need
708.36 -> to send say 20 packets in a connection,
710.43 -> which is actually a common
sort of thing to do,
712.47 -> I had to to spend three
or four round trips.
714.75 -> TCP tried to help us by
bumping it from say 12 to 16.
719.31 -> But still I had these parallel
connections all fighting
722.25 -> for bandwidth and worse than that,
724.83 -> the first connection says,
725.827 -> "Hi, I'm Chromium, and by the
way, these are my cookies."
729.96 -> The second one says, "Hi, I'm Chromium
732.727 -> "and these are my cookies."
733.77 -> Anyone see a similarity?
735.45 -> Yeah, they're all saying the same thing.
736.98 -> Multitude times.
737.91 -> 100 times in parallel.
739.68 -> What a waste of bandwidth.
742.02 -> So this is a problem.
743.34 -> Kudos to Mike Belshe and Roberto Peon
745.68 -> for driving forward to SPDY.
747.323 -> SPDY is the basis of HTTP/2.
749.88 -> Each get input.
750.99 -> It's put into this multiplex stream
753.78 -> and the idea is that now I don't have
756.93 -> to worry about in-order response.
758.46 -> Remember that example?
759.293 -> I said get me a GIF, get me a
search result, get me a GIF?
761.61 -> It could start returning the GIF,
763.5 -> then it would send off
their search result,
765.21 -> start returning the second GIF.
766.53 -> Oh, I have the search result?
767.76 -> Let's stop sending the GIF.
769.11 -> Let's inject some search results.
770.97 -> They're multiplexed.
772.05 -> We can put them in and tease
them out auto automatically.
774.51 -> We can get out-of-order responses.
776.13 -> We can prioritize
JavaScript and style sheets.
779.25 -> HTTP/2.
780.083 -> Very cool.
780.916 -> Very clever and SPDY HTTP/2
also reduces the redundancy.
786.69 -> They did data compression
of those headers.
789.48 -> It no longer had to say each time,
791.587 -> "I am a Chromium browser
and I have these cookies."
794.67 -> It says that once and
it keeps referencing it.
796.8 -> That's data compression.
798.03 -> That's adding efficiencies.
799.59 -> Wow, this is really helping the world.
803.31 -> And the stream shared a single flow,
804.78 -> a single congestion window.
806.25 -> So now the whole stream
would slow down or go forward
809.85 -> and we'd at least get
to prioritize the things
811.74 -> that we want to put on the
stream as soon as possible.
814.2 -> No fighting and variance
among the streams.
816.9 -> But there are weaknesses.
818.31 -> See, it's never as pretty as you hope.
820.74 -> Okay, the weakness here is
TCPs initial congestion window
823.08 -> I said was two packets.
825.06 -> That was actually terrible.
826.08 -> Contrast that with the
six parallel connections.
828.6 -> Six parallel connections could easily send
830.64 -> two, two, two, two, two.
832.182 -> 12 packets while poor
little SPDY sending two.
836.07 -> Next thing, SPDY sends
four, this sends 24.
839.49 -> You suddenly realize
the parallel connections
841.02 -> are ramping up much faster.
842.4 -> This is unfair.
843.51 -> This is causing people to not want SPDY,
845.94 -> which is a great thing.
847.83 -> And then that's a strange thing.
850.41 -> Why should we be so nice?
852.03 -> This old-fashioned method,
853.29 -> which is wasting the
bandwidth of the universe.
855.6 -> Then when we get beyond that
into the steady state of TCP,
858.63 -> who knows what the steady state of TCP?
861.316 -> There's a little bit of saw tooth wave.
862.26 -> Who's seen the saw tooth?
863.82 -> Well, some people have.
864.72 -> The interesting thing is we
have what's called a AIMD.