Plenary session
Monday, 14 May 2018
4 p.m.
CHAIR: A reminder to everyone, PC nominations are still open until tomorrow afternoon, so e‑mail a short above fee of yourself to pc [at] ripe [dot] net and we would love to have you join the Programme Committee who help bring you awesome speakers as you have already sign today and we'll see more of.
We also get the ability to banter while waiting for microphones to get hooked up. So, it's got so many different fundings. And as a reminder also, rate the talks. That is the best way to get feedback of what you think about the talks to the PC, so that we can make sure that we have the most interesting and relevant talks to the entire community.
And all right, our first talk, per will bring us on the challenges of network traffic classification with NetFlow ‑‑ I'm never sure how to say this, IPFIX.
PERE BARLET‑ROS: I call it IPFIX. So, good afternoon, first of all I would like to thank the RACI programme for giving me the opportunity to present our research to the RIPE community. In this presentation, I will talk about the challenges of traffic specification with low level data, NetFlow or IPFIX. This is with my colleagues on the slide here.
So, let me start by defining what we understand by traffic classification. In this presentation, I will use the term traffic classification to refer to the process of identifying the application that generated the traffic in the network. I guess today I don't have to convince that you traffic classification is useful for network operation and management, because you know it much better than me. And probably you already use some sort of traffic classification in your figuring and network monitoring tools. So let me describe very briefly the different techniques that these network monitoring tools are used to identify the classifications that generate the network traffic.
The first and most simple approach is the one based on the port numbers. This approach has the advantage that it's computationally very lightweight and does not require access to the pact paid loads. So this is why most NetFlow‑based products in the markets still use the port numbers in order to classify the network traffic. Despite the fact that using the port numbers is known to suffer from low accuracy and completeness. So, with the port numbers, we not only get incorrect results but also there is a lot of traffic that cannot classify.
The second technology I want to talk about the deep packet inspection, or DPI. DPI has the advantage of having much better accuracy than the port numbers, but this increase in accuracy comes at a price because the algorithms used by DPI are expensive and therefore usually you require hard aware appliance that is needs to be deployed in your network, and these appliances are expensive and are difficult to deploy.
Also, since DPI obviously needs access to the packet pay loads, then it might involve some privacy issues and this is especially relevant now that the ICPI is coming into forth in a few days.
Also, since this technique needs to inspect the pay loads, then it can not be directly used to identify encrypted traffic.
So, in order to address the limitations of DPI, the community in the last 10 or 15 years has proposed a large number of techniques that use much learning to identify the applications we had in the network traffic. This has shown very high accuracy, which is comparable to data you can obtain with DPI but it has much ‑‑ I mean, they have much less computational costs. Also most of the proposals do not need access to the packet pay load so that this means that potentially matching level to be useful to classify nipped traffic.
However, as we will discuss later, this requires usually an expensive training phase.
So, all, this is has in the literature good results, we, in practice, observe that current products don't use it very much, and if they use MLTC, they use it only as a complement of other engines like DPI.
We believe that this is low option did a result that these algorithms as proposed in the research works suffer from some practical problems that make their deployment difficult in production and networks. So, we identify three main reasons that we believe are slowing the adoption of ML learning in products. We name these problems recollect the deployment problem, the maintenance problem and the validation problem, which are those that I will describe in more detail in the next few slides.
So, let's start by the deployment problem. Here, we refer to a problem that current solutions are difficult to deploy, and this is because even mature learning solutions usually require access to the packet level in order to compete for example the futures that later will be used as predicters of the application. For example, Cyril works have been using the lendings of the first few packets of a connection, and this is an information that usually it's not available in flow records. So we thought that if we could make these algorithms work with flow level data such as NetFlow or IPFIX, then we could make these solutions easier to deploy in production networks, especially if we are able to support packet sampling because network operators usually tend to use aggressive sampling traits in order to reduce their requirements in the network agreement.
So, this is the question that we tried to address. So, if it's possible to adopt this mature learning proposals in order to work with flow level data. And in the next slides I will show the results for a particular match learning technique, which is the one that it's the most popular and the one that has shown better trade‑off between accuracy and training times.
So in order to test it, we use NetFlow version 5 and this is because in version 5 is where we have less information for mature learning in order to predict the application.
So, in order to evaluate this, we collected 7 full packet traces at the UPC access link and we levelled them using DPI in order to have a ground route to compute or to check the accuracy of the ML method and in the table we have the results obtained in the first column we have the accuracy in terms of flows for the 7 traces. Basically this tells us the type of flows that we were able to classify using the entry and as we can see accuracy was quite good. We got a result where which were around 90% which is a good result for this method. The accuracy that we got using only the port numbers, which is in the last column, was only around 10%, even lower for some cases. So this also tells us that the deficit that we use was quite challenging to classify, especially, or in particular for the case of the port numbers.
So, the results I presented were with plain NetFlow without applying something, so the next question that we tried to solve or tried to answer is, what will happen if we apply something to our input NetFlow? And in the figure, we can see the accuracy as a function of the sampling rate, and here it is clear that the results were not very good. So, even with a sampling rate of 10%, we got an accuracy that was a bit above 40%, which is not very good result.
So, we tried to investigate which were the reasons behind this drop in accuracy. And we identified three sources of inaccuracy that might be explaining this poor result. The first is the error introduced by the estimation of the traffic features. So since we were using sample traffic, we first need to estimate the actual values before giving them to the decision entry. The second reason was the change in the flow size distribution, because when using sampling, a small flow tends to disappear, so, this resulted in a training set that had different properties compared to the traffic that we wanted to apply this much lending method. And finally, we also observe an increase in the flow speeding probability due to the internal [die March] implemented by NetFlow.
So, in order to mitigate these three sources of inaccuracy, we had a very simple but effective idea, which was basically to apply something also in the training set. And with this simple change, we got a significant increase in accuracy, using values that were comparable to the accuracy that we obtained using NetFlow without sampling. So, for example, with sampling rate of 1 over 1,000, the accuracy was a little bit above 80%, which is a good result considering that the sampling rate was quite low and also it's a good accuracy for a learning method.
So, this brings us to the second problem, which is what we call the maintenance problem and this problem is related to the fact that the learning solutions need a training phase and this training needs to be repeated every time there is a change or an important change in the traffic. Also, if there are application updates or if new applications appear we need to retrain the ML model and this usually requires some human intervention we makes the deployment of this kind of technologies difficult in a production network.
So, we thought that if we could make this retaining process more automatic in a way that it's computationally viable and does not require human intervention, then we could make the deployment of this kind of solution much easier for operators.
So, here we have the architecture of our solution, so our main input is still the NetFlow traffic, but apart from the NetFlow traffic, we collect a very small sample of packet level traffic. We do it continuously, and then we use the small sample in order to pass it to a DPI tool, so we can get ‑‑ we can get the results a DPI tool would produce and then we can compare the accuracy for these samples with the accuracy that is obtained using our ML method that is input on the NetFlow.
So then we can check the accuracy in realtime and at some point if we see the accuracy drops below a particular threshold, we can retrain the model in an automatic way using the same samples that we have collected and allowed using DPI.
We call it lightweight DPI because the volume of data that we have to inspect is so low that we can do it easily with a commodity hardware.
So, we implemented this in practice, and we deployed it to the Anella Cientifica network and in order to compute or to check the accuracy of the solution, we collected 14 days trace and we used DPI and in the field we can see the accuracy over time that our system obtained using three training thresholds, 94%, 96% and 98%. So this means that if the accuracy at some point was below this threshold, then the system retrains automatically.
For the lower threshold, we see that we required 5 retrainings during the 14 days to sustain good accuracy for the case of 98% threshold, the number of retrainings was a little bit above 100.
Finally, the last problem I want to talk about is the validation problem. The validation problem is one of the most serious problems in the area the traffic classification basically because all works in this area use private datasets in order to check the performance, check the accuracy of the proposals. This means that since all proposals use different input data, it's very hard to tell which proposal works better, and also it makes also difficult to validate and to reproduce the results in the scientific papers.
So here our proposal was to try to make public that contain the pay loads and if we can do that, then this datasets could be used as a common benchmark that would allow the validation and comparison of different works.
Since making a dataset public that contains the pay loads, it's challenging as pay loads are private information. Instead, what we did is to build a real test bed and we generated the traffic ourselves running the most popular applications and in order to do it, we tried to emulate as much as possible the behaviour that a real person will have when using these applications. So we created accounts in most popular services, and we ran the applications trying to assimilate what a real person would do. So we made Skype calls, we, for example, we rate them, we post comments, we also use Facebook and post comments on Facebook. We played games and so on.
So, as a result, we made public a datasets that contained the full packet pay loads, but not ‑‑ we not only did that, but also we labelled these datasets. And in order to label the data, since we were collecting the datasets at the same computer where we were generating the traffic, we do label this datasets very easily. For doing that we used a tool which is called BBS, which is basically labels the traffic based on the application that opened the socket in the system. So with this we had a datasets that was better labelled that if we used a DPI instead. And obviously this dataset has limitations as we even ‑‑ although the traffic is real, we generated the traffic ourselves, so it's not real traffic, so probably the traffic makes in this datasets is not representative of the traffic that be be found in other networks. Anyhow, we made the datasets publicly available and so far I think it has an important impact on the community because more than 200 researchers ask us access to this datasets and so far has been cited in more than 100 scientific publications.
Overall recollect the datasets contains 750,000 flows, which account for about 50 gigabytes of data and we levelled the datasets at three different levels. The first is at the application protocol level so we had it according if it was generated by DNS, http, SMTP and so on. We also levelled it at the application level, so we included levels such as BitTorrent, drop box, Skype and so on. And finally for http traffic, we also included the service that was generating the traffic.
Apart from making that dataset public, since we had a dataset that was very ‑‑ labelled, we used it to compare the accuracy of different DPI tools. This was quite interesting because so far there was the accuracy or the accuracy between these tools was not known. And in our test, we used two commercial tools which are PACE and NBAR, and the Open Source tools, are here.
Here I present just a summary of the results that we obtained. First, at the application protocol level, in this case we saw that the accuracy of most tools was quite good. It ranged between 70 and 100%. In this case, NDP I and Libprotoident were the best performing tools. Only Libprotoident was then was able to identify some nipped protocols, such as encrypted e‑mail. The other tools were not able to classify any of the encrypted traffic. And the only tool that obtained poor accuracy was L7 filter, which suffered from a large number of false positives.
Regarding the application level. Here we observe a slight decrease in accuracy that was between 20 and 30% compared to the results we obtained at the protocol level. In this case, PACE and NDPI were the tools that obtained better results. However, it is also worth noting that Libprotoident obtain additional accuracy and this is interesting because Libprotoident was a tool that only used the first 4 bytes of the pay load.
More surprising were the results with NBAR which showed very low performance and was unable to classify most of the applications.
And finally, at this level the results were much worse, most of the tools were not able to classify any of the web traffic, or the service behind the web traffic. Only PACE and NDP I could identify around half of the web services, only 6 of them with an accuracy that was greater than 80%.
And finally, to conclude, we have seen that DPI products are expensive and difficult to deploy. However, we argued that traffic classification with sampled NetFlow is possible and it's easier to deploy and the volumes of data involved in sample NetFlow traffic are very low, so then it means that a traffic classification system could also be offered as a service in the Cloud. And this would with it, so we implemented our traffic classification technology following the principles I described in this presentation, and we received funding from the European Union to turn this technology into a commercial product. And as a result, we incorporated a company which is a spin‑off of our university, which is offering monitoring and traffic classification but also monitoring in general such as nomination detection and mitigation as a service and also as a non‑premise product.
So, with this, I conclude my presentation. And I don't want to bore you showing a demo system, but if somebody is interested at this website you can access an online demo where you can see what can be obtained in terms of traffic classification using NetFlow data.
And that's it. Thank you very much for your attention and I'll be happy to take questions.
(Applause)
BRIAN NISBET: Thank you very much. Are there any questions?
ARTYOM GAVRICHENKOV: Thank you for your presentation. I have a question about your data sets. As I was looking through that, I found out that each of the data sets has, like, four definitions which include process name, http URL, refer and content type. Could you please explain how do you get those ‑‑ that information like URL and refer like for Facebook or Twitter traffic. Because it's encrypted or maybe I'm missing something.
PERE BARLET‑ROS: I don't recall exactly for the traffic ‑‑ for the HTTPS traffic and we performed this experiment ‑‑ this dataset I think it was published in 2015, so probably there was a fraction of a traffic that currently is encrypted that was not at that time. Also, you have to consider that we generated the traffic from the same computer where we collected it, so we had access to more information. And now I don't recall the details because this was done with this tool that was not done by us, which is called BBS, and it was done by one of the others that was from the university in Denmark, so I don't recall exactly the details.
ARTYOM GAVRICHENKOV: Okay, let's shift the question the following way: How much of the training was actually ‑‑ how much of the training actually depends on the, like, features like process name and URL, which are not available to an ISP or a correct party?
PERE BARLET‑ROS: None of them. So basically for the training, we are only using information that is available in NetFlow records. So, the accuracy figures I include here are only for the DPI tools.
ARTYOM GAVRICHENKOV: Okay. Thank you.
PERE BARLET‑ROS: Does that answer your question or not?
ARTYOM GAVRICHENKOV: Or maybe we'll discuss it later.
BRIAN NISBET: This is the kind of thing that you'd best discuss in a coffee break. Absolutely. Any other questions? This is really quite a big room! Thankfully, I am built to scale. So, no other questions. Thank you very much.
(Applause)
BRIAN NISBET: So, Leslie actually introduced herself, this is Leslie, I am Brian, we are your chairs for the next hour or so or depending on how long Erik speaks for. But our next presenter is Erik Bais and he'll be speaking about why we are still seeing amplification DDos traffic. I assume the answer is because there are mean pooh people on the Internet but his answer might be a little more detailed.
ERIK BAIS: Well, you are actually not far off.
I think the introduction slide speaks for itself.
How we got to this. I work for a Dutch Internet provider a couple of years ago I, we created a rating system to see if we can actually predict where do we expect DDos traffic from. There is a link there to the presentation I did in RIPE 72, and since then, we recreated the complete API, and it's open for public, if you want to have a user ID to it, and I'll show the results of the API later in the presentation. You can just send an e‑mail to us.
So, amplification DDos attacks. Still an issue today. NTP, DNS, SSDP, Chargen, more only the whole Memcached stuff. And is it you look at the traffic during the DDos, you'll see it's still the larger DDos traffic originators are basically the same as four years ago. Now, obviously one of the things that local police are trying to do is getting the DDos sites out of the air but it looks a bit more like this, [whack a mole], and as facilitators, the question is, are they actually the real issue? Because hunting them is basically [whack a mole] and we are not getting anywhere.
So, some of the applications are actually worse than others. And on earlier presentation with ‑‑ here ‑‑ you can see in the list here, and the list is actually published by the US cert, and they actually have a complete list here what the amplification factor is of all these different vulnerable services out there on the Internet. And some of them are actually worse than others. And as you can see here, Memcached tops it all. Another nice one here, Chargen, NTP... and where is DNS? There. And that's actually what we used in our rating system as well. We used the results from this list to do our magic, to create the rating system.
And this is basically the take that I want to bring on here. We are our own biggest issue. We deserve every DDos attack that we get. And the reason is, we still don't clean up our networks. And yes, some are actually worse than others. But the basic fact is if we fix our shit with our customers, there is no vulnerable server to abuse to actually send a DDos attack. Let this sink in. Because this is the issue. We are just lazy bastards.
So what are we doing here? Are we preaching to the choir? Well not really. Because, if we look at EPF for instance, where you think those are the people that actually want to peer, the number 10 of the naughtiest networks on this list was actually a RIPE regular. So maybe I need to do this presentation again on EPF, but we still have some parties here in this area in this room that actually has some issues in their network as well.
And some of us are actually naughtier than others. So this is the result of some of the API responses.
And for instance, OVH here in the middle, they have some interesting results, and yes, they had a really big DDos attack, but if you look at it, their own customers could actually cripple their whole network, and what do you do then? How do you filter that?
Or here... look at this... SSDP, 120,000 vulnerable servers in your own network. Let's look at some others. Deutsche Telekom, Liberty Global, RCS. So we know that the Germans are very sensitive, but come on, 45,000 NTP servers in your network, how much time do you want to have?
If you look at here, DNS servers, open DNS servers, open NTP servers. Here, Memcached D, NTP, it's all there, and those are your own customers. How difficult is it to actually send your own customers a small e‑mail saying, hey, you have a vulnerable server in your network and is being abused for DDos attacks. Could you close it down, fix your server, fix the configuration, you will do yourself a favour because you will actual have the bandwidth to use instead of have somebody else use your bandwidth.
And if you actually look at the rating system that we created, we are actually looking at it per AS and the announced IP addresses per AS. And that actually gives some interesting results, because sometimes the AS that we peer with is not the AS that actually originates the problem. So here you see some traffic from, in this case, MegaFon, which is actually reasonable, but the actual problem is behind it, as telecom where they actually have 17,000 NTP servers behind it that was actually causing the issue.
So here you can see the actual peering relationship from the Hurricane website where KOZA telecom actually is behind MegaFon and that's who we peer with as well. So this is interesting. Are you afraid yet of your own customers crippling down your own infrastructure? Or are you waiting to fix your shizzle until they are actually being abused to cripple your own network? Because if you wait until they actually use the devices in your own network to do a DDos attack on your infrastructure, how do you think you can start whining at companies like Akamai and Prolexic to keep your network up and running? That doesn't work if the attack comes from within your network.
Because this is what's happening. This is just an accident waiting to happen.
But perhaps, you know, you have shares in Akamai, Prolexic or Incapsula or something similar, you know, this is possible. I don't know all the reasoning for why people do or do not do things. Maybe you just like to pay for useless capacity waiting to be used for a DDos attack. Maybe this is the answer. Sometimes men just want to see and watch the world burn.
So this is time for action from you. What we all need to do is to add trait all IP addresses and your customers in an IPAM. There is tools like digital ocean net box or nipap. There are several solutions possible to link your customer debit number or credit numbers, contact numbers in your IPAM and have that linked to a contact e‑mail address. This is not hard to fix. This is Open Source, this is ready and available to implement today.
Then what you do, is you use something like AbuseIO, which is a free‑to‑use NTP use management system mand link that to that same fortunate that you just installed to register your IP addresses in and then watch the magic happen. Because this is what will happen. AbuseIO can look up the actual IP address and the related contact number and e‑mail address of your customer and automate the messaging to the customer once those messages come in from the abuse feeds, and your customer will actually be happy and they will actually be notified there is something wrong with one of your devices, whether that's a router, a server, whatever. And it's not only amplification shizzle, it's also botneting factions, compromised servers, copywrite infringements, whatever. So you basically automate your complete abuse management.
So this is the list of the feeds that AbuseIO can process for you, which is quite extensive, and since they are ‑‑ since they are also doing ARF formatting there are a lot of abuse messages that are coming complete in ARF formatted messaging so it's quite easy to process.
So, what are the results if you actually do this work? Which is a lot cheaper than keep buying additional capacity.
Happy customers. Even better, happy managers. And if that doesn't motivate you, happy peers. So in the end, everybody wins.
Right, questions...
(Applause)
CHAIR: Thank you, Erik, and everyone, please come up to one of the microphones and remember to state your name and organisation before you ask your question.
AUDIENCE SPEAKER: This is Kostas Zorbadelos from OTE Greece. Have you actually done any real work in a production network? Have you cleaned it up?
ERIK BAIS: Yes.
KOSTAS ZORBADELOS: And was it that simple?
ERIK BAIS: Yes.
KOSTAS ZORBADELOS: Okay.
AUDIENCE SPEAKER: We are a really small ISP and hosting provider, I think we sell Cloud provider right now. First thanks for repeating this again, I think this is never again. We manage to ‑‑ we mainly use the creator reports to clean up our open ports for DDos protection. We never really get some real abuse reports, because we are a really small network and I don't think ‑‑ well all networks will be done faster than almost anybody else in the room. But we contacted by mail our customers mainly it took one mail, two mails, three mails, and it ended with closing the ports simply because most of our customers didn't do anything, or rate meeting on a specific port. Okay, there is almost 30% that do something, but I don't think this is really as simple as you seem to say.
ERIK BAIS: So, if you look at using the various reports that you are using, the different feeds, specifically if you have the complete abuse management automated, having the different feeds going in there will actually help the customers to inform them that something is not, you know, correct. And I like the e‑mail loop that you get from AbuseIO, because it actually explains to the customer why they are getting the e‑mail and what's going on, it has links to how they can fix it. So it's actually self‑explanatory to them. Obviously if they are running a server and they have no idea how to follow instructions from an e‑mail, you should wonder, should they actually be allowed to run a server that actually has issues on your network. But on the other hand, it may also be a business opportunity in actually charge some customers for additional support. But, having the customers being informed will actually, you know, save a lot of issues out of your network. Sometimes it's being customers that are included in a botnet malware infection, those kinds of things, and sometimes it's even match your Memcached D server or do something with your NTP server. Doing nothing is not an option. We are well beyond that.
AUDIENCE SPEAKER: I agree. Just a little more addition. Do you intend to give public access to your IP A to have ‑‑
ERIK BAIS: Yes.
AUDIENCE SPEAKER: So know what is on my network and check with external tools?
ERIK BAIS: Yeah, so as you have seen, we only look at the aggregated data. If you want to have the specific data, because the data that we receive, we get it from shadow server. You can get the detailed reports from shadow server. You can also import the shadow server feeds for your own network requested at their website and they will get detailed reports which IP addresses are actually infected and have the issue. We only look at the aggregated data upper AS. If you want to have access to the API, I can provide you with a user ID for it.
AUDIENCE SPEAKER: Okay. Thank you.
AUDIENCE SPEAKER: This is more an announcement. Kevin may do have from the Internet Society. It's more of an announcement than a question. There is an initiative running at the moment called MANRS, mutually assured routers security, which is very much along these lines, I just want to make an announcement about that, go to the URL, and we have also got stand with some T‑shirts on Thursday, so even if you don't go to the URL you can come and get a T‑shirt and find out what it's all about.
AUDIENCE SPEAKER: Benno Overeinder. Two questions. So, Kevin just mentioned MANRS, but it's just kind of implement re from my understanding, so it's also about network hygiene, and that is also what you propose, or look at what you host in your network. So, do you also ‑‑ a question to you or to the room ‑‑ do you also consider that as part of your network hygiene to get what MANRS ask you to do as a minimum kind of thing?
ERIK BAIS: You need to have an Olympic minimum. So ‑‑ but just doing BCP 38, for instance, will not fix this issue. Because there will always be a network that does not fix spoofing, and specifically if you look at the booter sides, they will also be on a network that provides spoofing or spoof traffic out of their network and every single amplification vulnerable service in your network will happily reply to whatever you have. So the only actual fix for this, is to fix the vulnerable servers in your network and then do, you know, the other stuff as well.
BENNO OVEREINDER: Thank you. Second question, I am not an operator, did you clean up a network? But you talked about customers and you have maybe customers which are other networks. If customers like me end user. Is there a difference in cleaning up my mess might be more difficult or difficult end users?
ERIK BAIS: It depends on how stubborn you are.
BENNO OVEREINDER: I can imagine at Liberty Global they have hundreds of thousands of end users like me, people at home that are not aware of running their DNS ‑‑ that run an open ‑‑
ERIK BAIS: For them it's not a thing that ‑‑ I think they don't care enough, for instance. Because if you look at the amount of users, you know, if you look at SURFnet for instance, as a prime example, they have a very nice clean network, they are very active in this, and they run a lot of research, different kind of servers, a lot of students, and if you look at their rating and the amount of vulnerable servers they have, they have more IP addresses in Liberty Global or and they have a really sizable network. So why can they do it and somebody else can't? It's just because they don't care. And that's the only conclusion that I can come up with. And that's what we need to change.
BENNO OVEREINDER: Thank you.
CHAIR: All right. Thank you everyone. And now on to one of my favourite ‑‑
(Applause)
Now, on to one of my favourite parts is is lightning talks. So we are going to hear about the new Internet, the old one is over folks!
JORDI PALET: If you expect from me a talk about IPv6, you need to wait for the rest of the week. I am talking about new Internet from a different perspective. This is actually a talk for 30 minutes so I'm going to run a little bit, I want to do a very, very quick introduction, a very, very quick introduction to HPPT 2 and QUIC, DOH and something else. I am waking you up in case you are not following all the new work in IETF, some of them is not so new.
The thing is here, more and more in the recent years we have been moving traffic to HTTP and HTTPS ports. Up to now only DNS escaped from that, but it's also coming. Obviously that has some advantages because having everything in a single port, it means it's easier to control everything, so that's probably one of the reasons for that. And we have the perception with that that we can clean up better everything and control everything much better, right. So, we have also the perception that this is probably an opportunity for improving security and privacy. Very quickly, I'm not going through all the text in the slide. Just to give you an idea.
We started HTTP/1.1 in 1999 ‑‑ 9 is HTTP and then 1.1 in 1999, and while at that time it was a few objects, a few kilobytes and now we have a different situation. So in 2009, Google started some, to post some information about their work related to SPDY, which basically means they were trying to multiplex different requests across a single TCP connection, compress the headers and a few more things, including allowing the servers to do push.
That become, in 2012, an IETF Working Group and we have a standard since 2015, so that's HTTP/2. HTTP/2 don't regard HTTPS but the fact is that most of the process, I would say all of them basically only implement HTTP/2 with TLS support. Of course, that should not be a problem because we have tools like Let's Encrypt which will allow everybody to have HTTPS without paying for certificates and so on, but that's not really happening.
Okay, there was a small change regarding the protocol that SPDY was using versus what HTTP/2 is using. And it it was also a big change of the support that SPDY got in 2016 was already over 90% worldwide, while today, HTTP/2, I look at at it about 30 minutes ago, it's only 26.1%, but this is in terms of number of websites. If you look at what websites are using it in terms of traffic it's much higher. So there is a big penetration of HTTP/2.
Very, very quick summary. Again, not going through that, about how you have the format in HTTP/2, you can take a look on the slide over there.
There is a demo. I'm not running the demo but you have the links over there so you can take a look afterwards.
There is also a small extension to Chrome and Firefox so you can see if HTTP/2 and other protocols I am talking next are labelled in the website you are visiting.
And going too quick. During the SPDY development, it was obvious that TCP is inefficient for most of the actual usages, so they started to work on what we call it QUIC UDP Internet connections. In 2016, it was set up a working group to develop an UDP‑based stream multiplexing encrypted transport protocol, which, at the moment, is only looking at initial user case which is HTTP over UDP. It's already deployed by Google and according to the stats I am looking around, it's around 10% of Internet traffic. In very, very short, it's transport over UDP, typical implemented an application not kernel, the functionality is equal to the TCP plus TLS plus streams. It's including already TLS 1.3 to establish session keys and to encrypt all the traffic, including ACKs. Enables 0‑RTT, we'll see that later. And in the actual draft, this is still not the standard, draft up to today, the draft is version 11. Only a few parts of the short header used for all the packets except handshake, remain encrypted. So this disallows passive measurements and there is another draft which is, call it the spin bit, by basically is deciding, a long discussion on that, if we are going to use just one bit or two bits to make a possible those measurements.
So this is a quick slide about HTTP only HTTP with DLS and QUIC, you can see that with QUIC the repeat connections can go to zero milliseconds.
This will be a compilation between HTTP/2, which I what I spoke in the first slides compared to HTTP over QUIC, you see the TLS 1.3. DNS over HTTP.
Actually, this is using only HTTP/2, so basically the idea is the IETF DNS over http S Working Group is working on striding the encoding and DNS queries and response over HTTPS. The transport is treatable for bot, traditional DNS clients, native web applications that use DNS and is not just a tunnel of DNS over HTTP, it's a little bit more complex, so you really want to take a look on that.
And finally, there is also a draft which is actually version number 3, which is DNS over QUIC, and here you have the format of it, and I think that's the last slide, so there is here a compilation of the different options for DNS transport and I think I am good with time, so basically, I'll stop here. There are some conclusions, if you want to take a look. I'm not sure if we have time for questions. Anyway, I'm here all the week...
BRIAN NISBET: So we have got all that. We have time for one very quick question, if somebody has one.
AUDIENCE SPEAKER: Hello. Radu Feurdean, Coriollis Telecom. I see in all those new and fancy protocols, we have encryption everywhere, so basically we want to encrypt the hell out of everything. On the one hand, for us, we see this as a good thing. On the other hand, we have law enforcement, we have some other legal requirements and stuff like this which all require that we log all traffic, that we filter the hell out of it. But... yeah... okay. Censorship is sometimes part of the law. Again, logging everything, including encrypted stuff is part of the law. So we have to reconcile it somehow, those points, and how do we do it if we start encrypting the hell out of everything? We already have man in the middle SSL stuff, how do we do it?
JORDI PALET: Well, this is a design of protocol is not really working for law enforcement authorities, there are should be some point of course, but I'm not ready, really able to answer ‑‑ ready for really answering that question here. I think it's something beyond the design of the protocols. There are probably other foras. But this is also protecting end users, right, so...
AUDIENCE SPEAKER: End users, but not operators.
BRIAN NISBET: If this takes 15 seconds, Randy.
RANDY BUSH: Randy Bush, IIJ. Feel free to record all the encrypted traffic you want. As Geoff Schiller said, when law enforcement is made too easy, it's called a police state.
(Applause)
BRIAN NISBET: So, our second lightning talk is Guillaume Mazoyer, on easing peering session management.
GUILLAUME MAZOYER: Hello, everyone. So my name is Guillaume. I am French, so I speak bad English, of course. I am here to talk about peering manager, which is a project I have started on my own to basically because a peering configuration is boring, we all know that. Peering is fun but peering configuration is boring. So, we are here to talk about it.
So, I designed a tool to make it automatic, automation, everything. Why is it so hard? Basically, it should not be hard because we all have scripts here and there about automation, some Ansible scripts, some scriptures as well, and we also have maybe, some of us, copy and paste, basically. With some variables to change during the copy paste. What I wanted to create is an automation tool that can basically replace all this and be usable by anyone which ‑‑ who can be an engineer, a technician or basically anyone.
So, first of all, who am I? I am just a network engineer with an awful lot of experience, just four years basically, and I'm used to contributing to Open Source software with servers of companies you probably now about, like DBN, and I have created some software of my own like Looking Glass, maybe some of you use it. Basically, Looking Glass is like a tool to show some stuff that you have in your router, and the existing Looking Glass was quite ugly, so, I decided to redo it on my own, which is with a boot strap because I am not good at design. I am an engineer.
I have done also a PeeringBD Go API which is unknown, but which is just for fun.
So peering automation, for me, should be easy, automated of course, because it's automation, and ‑‑ how can I say that ‑‑ it should be painless, making peering great again basically, as Trump would say.
So, every organisation which is a carrier network, an ISP or which actually peers around the world at different internet exchange points might need at some point some kind of automation because we have some peers like 1, 2, 3, 4, 5, 6, then it becomes 10, then 15, then 50, then 100, then 1,000, so it depends on the size of your organisation. And like I said before, it's quite redundant to do the same thing over and over again. So, here it is.
As you can see, if some of you use net box, I did not invent anything more than net box, it's just the same design because I am a lazy guy, or lazy bastard, I guess, and here is the interface, manage your exchange points, your autonomous systems you are peering with, your communities, your templates and everything.
Just some views I did to create, to do everything. And it has an interesting feature, which is an integration with PeeringDB through an API. It allows peering manager to import everything basically, if you have a proper PeeringDB, you'll have a peering manager import everything it has from the PeeringDB I, and just do everything by itself. Again, another view, another view, another view. We don't have time for this, as well, as well, as well. It's a standard configuration that can be used by peering manager which is some Juniper 2 template. It be be used to create a configuration and then this configuration can be pushed to your router without any action from your side. Basically, it choose the NAPALM framework, maybe some of you knows about it, there was a talk about it this morning as well.
And, for example, we have some sessions. If you have connected your router with the NAPALM framework through the peering manager tool, you can see if the session is established, if you received the route from your peers, etc.
This tool is use with Python with the Django framework, it is still under development because it started about nine months ‑‑ I started it about nine months ago. It uses NAPALM, if you want to you can use it without the PeeringDB API as well and we have some future development to be done like discovering ASN automatically, remove abundant peering situation, prefix for router, maintains of IRR objects, session passwords, something that is not implemented yet. Notifications through mail, through telegram or whatever. This tool is available for free of course, and is free Open Source. You can check it out at this URL.
Some of our operators have already contributed to it through the nations, through hardware testing, through adapting it, basically. For the record, I am part of LuxNetwork, so I am using it at my work, and basically that's all. So if you have any questions and if you don't get that researches, you have to watch more good French movies.
CHAIR: All right. Well, thank you, Guillaume. And for the record, your English was great and I am an American, so we all know I don't speak English right.
All right. We have time for a couple of questions, if anyone has any.
GUILLAUME MAZOYER: Nobody has any, so it's clear.
CHAIR: And now William is going to give us an update on the RIPE accountability Task Force.
WILLIAM SYLVESTER: Greetings everybody. Thank you for sticking around. I know it's the end of the day, we'll try to make this quick. First and foremost, we are going to have an accountability BoF right after the session in the side room, we invite everybody to come and attend. We wanted to give a quick update to the Plenary just to keep everybody in the loop. We have been working on this task force now for about a year‑and‑a‑half. We started the task force in the Madrid meeting. Since then we have spent a lot of time reviewing the structures, processes and documentation.
So, what does that really mean? We have actually gone through the mailing lists. We have reviewed all the policies. We have looked at the structures, how the database ‑‑ sorry, how the Working Groups all come together and what it means to sort of be a Chair. All of this comes together in how we, as a community, check ourselves. How do we make sure that we are protected from things like capture and other organisations taking things over? And we're working on publishing a draft. We have got a draft that we have put together right now. We have a few other items that are coming together that we'd like some feedback from the community. That's the purpose of the BoF today.
But we have been looking at things like the community values, how we define consensus, how consensus compares to things like rough consensus. And ultimately, how do we preserve what our community is and how our community interacts and maintains trust over time? We have seen Hans Petter talked this morning about his 25‑year term that he is currently serving as the RIPE Chair. But he is the first Chair to take over and how do we plan succession going forward to sustain the ideals that have created our community?
And, you know, so, from that perspective, we have Deved deeply into a lot of these things. We are looking to publish a report coming up here in the early summer. A lot that have requires some of the feedback that we're looking for from the BoF. It's been information that we have debated internally to the task force, we have taken the feedback from the previous meetings, discussions that have been on the RIPE list, we have integrated that into a report, and we feel that ultimately our community is accountable. And you know, with that, we feel pretty good about where we stand. But there are a handful of things that we're going to come back to the community with that at some point should probably be looked at and should be considered.
So, with that, we hope everyone can attend. Once again we'll be in the side room after this session. And thanks again.
(Applause)
BRIAN NISBET: We said lightening. I think that might be a question... I mean, there is obviously a BoF where you can discuss all sorts of things with William and the rest of the task force.
AUDIENCE SPEAKER: Randy Bush. Looking up accountability, it's about blame, right. And I think your presentation was better than your title. I think what our goals are, what our ethics are and what our methods are are important, and our culture, necessarily maintaining it makes some assumptions upon what parts of it are ethical, etc. But I'd just ‑‑ and one of the key parts of the solution might be a word that I didn't see at all, which is transparency.
WILLIAM SYLVESTER: Sure.
BRIAN NISBET: Surely the easiest part of any culture is keeping the good bits while exactly getting rids of the bad bits. No, it's the other.
AUDIENCE SPEAKER: Rudiger Volk. Kind of, yes, for ‑‑ in‑depth discussion for a meeting is needed. On the other hand, I'm a little bit disturbed by having a presentation that is essentially just matter information. You are saying we have these topics, but what the content of the topics is, of course, is the crucial thing.
WILLIAM SYLVESTER: Absolutely.
RUDIGER VOLK: And ‑‑
WILLIAM SYLVESTER: One of the things we're hoping to cover in the BoF is to dig into a little more detail. We didn't think it was appropriate to spend an hour or more digging into every single detail in this forum, which is why we set up the other meeting. But within that, anything that you'd like to discuss, we're happy to discuss.
RUDIGER VOLK: Taking up Randy's thing about blame, at least an URL for the draft document could have been in here.
WILLIAM SYLVESTER: I appreciate that.
BENNO OVEREINDER: So the thing is the PC is kind of this also well scaled, asked William to give a lightning talk to talk about the BoF afterwards. So it's just ten minutes to make people aware there is a BoF, and it's I think it's not fair to comment on what he didn't present yet because he will present everything at the BoF.
WILLIAM SYLVESTER: Of course. Thank you.
BRIAN NISBET: By the way, for those wondering why there is now two on that side and one here and one all the way up there, it's because we suddenly realised that people with accessibility issues who could not get down the stairs would not be able to get to the microphone, so yes, it means there is a slightly longer run on this side but it is obviously to make everybody able to participate in the meeting, which is rather important. So this side‑bar conversation is not a question ‑‑ thank you very much.
(Applause)
So that concludes the Plenary Session for this afternoon. Thank you all very much. Please remember there is a number of BoFs this evening ‑‑ I am also going to ask people to rate the talks. There we go. I was getting there. So, there are BoFs this evening, please go to them. There is the welcome reception, the social reception this evening as well. Please rate the talks. I think there are prizes and things and other excitement, but, most importantly, feedback to the PC, and if you are interested in becoming a member of the PC, then we are holding elections for two spaces this week. So thank you all very much and see you, if not before, nine o'clock tomorrow morning.
(Applause)