MAT Working Group
Thursday, 17 May 2018
4 p.m.
CHAIR: Good afternoon everybody. People in the back, feel free to come to the front. There is a lot of room up here. People upfront feel free to go to the back. Just for sort of interaction and stuff it's a lot easier and quicker if we're a little less spread out. I'm going to go ahead and start talking. We're not quite started yet. One more minute, but we have a very full agenda today. I am Brian Trammell. I have been a co‑chair of the MAT Working Group since Dubai but this is my first time up on stage saying anything other than 'Hi, I'm Brian Tremmel.' We hope you enjoy today's programme and session, we didn't do this really on purpose, but we kind of have a web measurement focussed agenda today. So, four technical talks, review from the RIPE NCC tools team and Nina is telling me I need to speed up a bit.
STENOGRAPHER: Not too fast!
BRIAN TRAMMELL: Then a quick talk at the end about some measurements from MANRS the routing hygiene project.
So, with that, if nobody has any suggestions, comments or questions for me, I'll see you on the mailing list and we will start with Patricia.
CHAIR: We should mention the scribe and, that Working Group participate from Jabber.
BRIAN TRAMMELL: Yes.
Nina we have a scribe. It's Alan. Thank you very much for doing this. Do we have anyone monitoring the Jabber? Yes, so we have Miriam here is going to be taking questions on Jabber, so from online participation. And she is not late, we are starting early. Don't worry Miriam, you'll be find. And of course we are our amazing stenographer here as well so we can read what we're saying because we're not speaking very clearly here. And I think that was it. We have a question over there.
AUDIENCE SPEAKER: How is your leg?
BRIAN TRAMMELL: It's getting better, you can see I'm not limping around with the caine any more, it's day by day. Thank you very much for your concern.
So, with that we'll go straight to Patricia, talking about ad based measurements from the edge of the network.
PATRICIA CALLEJO: Hello. I am here to talk about the opportunities of ad based measurement from the edge of the network. Before starting I would like to thank you for giving me the opportunity to speak here.
I will start from the very basic concept which is tens of thousands of ISPs give network access to millions of users everyday and for this reason it's very important the quality of the experience received by the end user. This may be affected by ISP network design, policies, network configuration.
Furthermore, the search have discovered a knew ‑‑ like DNS, HTTP header injection.
We will consider that it's very important to perform Internet measurement from the edge of the network from the end user perspective. So, there are several projects that have been working on doing a very good job in this area, one is RIPE Atlas. Archipelago, Netalyzer and Luminati. The two are based on having different nodes located around the world and they perform measurements from these blogs. And there is an android application to perform measurement. Luminati is a plug in to perform BBN measurement.
Just to compare a little bit this project. A number of sessions, RIPE Atlas and Archipelago have a number of nodes. Netalyzer and Luminati have a number sessions. In terms of targeting none of these projects have the ability to set the targeting capabilities, like for a exmple you go to measure United States, cannot set it like okay I want to run my experiment there. So, this is one of the features that we like to highlight here.
In terms of time, Luminati have like a certain amount of time to run the session. In terms of ISP coverage, this is ‑‑ they have a low number of nodes around the world. All of those have a high number of ISPs. And in terms of measurement capabilities. Luminati is ‑‑ all of the other projects have full control of the measurement that they want to run.
So, I would like to present you AdTag which is an ad based measurement methodology that works with the leverage of the nature of the ad networks, which covers large ISP in a short period of time. This is automatic because we can add the targeting capabilities of the networks, what I mentioned before if we want to perform measurement in a specific area with other networks we can do it.
Here is an architecture of this methodology. The technical aspect, the deployability the cost of this methodology, the ethical aspects and the targeting capabilities.
So starting from the architecture, we will have an ATM L 5 advertisement with a JavaScript code inside. This advertisement will go through DS P in which we will set different targets like geolocation devices or operating systems and once we have our in the DSP, but it will go through the other chains and show that in the different publishers, once that is sold to a user in our website, we will run our experiment in the Java script code and send the information to our content server.
So as I said, this is a technical advertisement and it's in Java script. They are limited by the browser libraries, we have seen one XML HTTP request to perform HTTP request under TCP. Web socket to open a tunnel between our advertisement and our server on the TCP and we web RTC in order to get the advertise another that based on the ‑‑
Can be can he employed in different areas of the ‑‑ we are seeing DSP which is one of the areas and we can set different aspects like the geographical location, the browser brand, the device type and the operating system.
In terms of cost we want to maximise the number of impressions at the minimum cost possible, we are trying to putting a gate in order not to get too much clicks. We are using the minimum CPM, cost per mile. So, we will have around 1 million measurements per $100, which is considered low cost.
In terms of ethical aspect, I consider this an important point because we do not have the user consent, we are running an ex perm in the background of this advertisement. We can take into account we do not track header. We have this option enabled, we must not run any experiment because the user asked for that. But, more than that, we encourage all of the community to follow this guideline in order to not use excessive amount of data. We running on the devices because we can ‑‑ do not collect any personal information that may compromise the security of the user. Do not tend to use this that may compromise the risk of a company for example if we are running DDos attack behind an ad or something like that. And do not enter any experiment that may compromise the integrity of the user, like if we are running experiments which are sensitive, do not forward user information to Facebook.
So, this is like an evaluation of the methodology. The methodology as we have a limited time session with the advertisement, the targeting accuracy and the browser support.
Starting from the execution window is the time active that it is newer place. We have to take this into account because some users may be up in our website and have to stay there, so we have to take that in account in order to run our experiment. So, we can see that for more ‑‑ we have 30 seconds ‑‑ but we have less time in mobile phones than in desktop which is important in order to set up our experiment.
In terms of targeting of ISPs, we run a campaign around 3 million measurements around the world. And we can see here the country and the percentage of measurement per country. And we see that in the US more percentage of number of measurements but we have measurements from more or less all the countries. But what we're going to measure here is the accuracy of the methodology that we can target a specific location and we ran a specific campaign for this and we found that all of these measurements came from the U states, 97%, and the 2% came from Canada, which we consider is more or less ‑‑
In terms of browser support, we see libraries that not all the version of the browser are supporting these libraries. So overall we can see that Chrome is one of the most used brother. But we want to leverage the version of the browsers, that's WebRTC, WebSocket. We can see that desktop Chrome fully support these libraries but in the cases of Mobile Chrome and Mobile Safari, we do do not have much support, we have to take into account in order to set up our experiment.
So, this is the methodology and I want to present you some possible use cases that the methodology has. The first one will be that the middle boxes. For example, you have this experiment here, we have a PC with the advertisement. And we will have our control server. From the advertisement we will send out the this to the server and our server will reply with a reset message. It will have a man in the middle between this connection. What will happen is that we will send the TCP from the advertisement and the man in the middle with reply with an acknowledgment. That is how we propose to detect these kind of cases.
In our case it will be a NAT detection with a similar scenario. In this case we will use, like for example this one, and we will send a stun request from the advertisement and we will get the IP address, the public address, the private address and the NAT type of the user.
More cases can be performed with this methodology like CDN performance in order to see the replica that we obtain depending on the location of the replica that we have. IP classification which is more basic stuff like countries and the statistics about the IPs around the world.
So just to summarise. We will compare with the first slide, what we are adding here more or less, the more important what we have here is the targeting capabilities of this methodology that we consider this important. But, we are not perfect of course. Just an alternative of measurement Internet. And we are not full control of this methodology because it's running in the browser, so we have some limitation with this point.
So that's all. If you have any question or comment, I will be happy to answer.
BRIAN TRAMMELL: Thank you very much. We have got about five minutes for accuracy on this one.
BRIAN NISBET: Just wondering in relation to this kind of methodology with the growing use of ad blockers, with the huge you know seeming Leon going problems with online advertising, what do you see the future?
PATRICIA CALLEJO: That's a good question. The point is that there are blockers, there are premium advertisements that if you have ad blocker will see all those ads, so within that advertisement, they will disappear at some point. That's not ‑‑ but I think the advertisement will always be there.
BRIAN NISBET: That's a potentially very depressing future but thank you.
AUDIENCE SPEAKER: I was wondering if your measurement targets change if you increase the money that you use for placing the ads?
PATRICIA CALLEJO: Yes of course, we are trying to reduce the amount of money that we are using for this methodology because we are like in the university, but if we want to have more measurement you have to spend more money.
AUDIENCE SPEAKER: Do you also get a different kind of targets? I am more Mac users because you spend more money on the ads?
PATRICIA CALLEJO: No, I don't think so. This is based on the website that you are showing your ads, for example it may be more popular websites. But for us, for performing the measurements, it's not important. Brian Brian I have a question. How can I use this?
PATRICIA CALLEJO: I can't show you the ads or the measurement that we have.
CHAIR: To actually run my own measurement campaign.
PATRICIA CALLEJO: The point is we are using an intermediary that we are a talk with them to perform our measurement there, but the code and the measurement that we obtain can be available if you need it. If you want to reproduce it, it's easy, just give me the money and put the advertisement.
CHAIR: Thank you very much.
(Applause)
ANTOINE SAVERIMOUTOU: So, Hi everyone, I am in my second year of Ph.D. at orange labs, formerly France Telecom and I'm making my PhD. So I am today presenting you a tool so to perform realtime monitoring and trouble shooting of web browsing sessions. The big question is why? Why should we do that? The thing is that right now we are getting new trends at application level Internet protocols, a concrete example is QUIC over UDP, so we'll be having DNS over QUIC, so you will an of this needs to be monitored in order to know what is happening are all around. Furthermore we have having so many new web matrix being brought in, and you too the research community, in order to let's say better quantify and qualify the end users experience.
So, what is the goal of the tool it's simple. All of you have an ISP router at home. And when it doesn't work, you have, let's say, three options. The first option would be to talk to her. Please love, start working. Please work. Or the second option would be to smash it against the wall. Or let's say the third option would be to call the hot‑line. So we are a network and we need to be able to really respond to the clients who are having an issue. So, we need to be able to detect for example is the problem at home? Is it the device itself? Is it the Internet operators network? Is it the traffic degraded at the border of the network? And of course is it the distance web server is messing around?
So we did make a tool in order to monitor all of this. We named it Morris, it's measuring and observing representatives information on websites. And what does it look like? Well it's Morris, so, the thing is that you can do unit testing from the end user side, so that means you just run the code on your machine. You can do unit testing on your favourite website or you can do, for example, the Alexa top 10 K websites for example. And the thing is that the test configuration you can set up so many configurations because we are using the real web browsers right now, the real on market browsers which are driven by selenium and the corresponding drivers. The test engine will be driving the measurement and during the test execution, you can perform the test either in graphical mode because so many persons they are interested in maintaining evidence of the web sessions and then making a benchmark so making a score from real users,
What we are collecting is that we are collecting all the web matrix that do exist right now in the community and we also retrieving the real HTTP archives from the brothers, let's say the network logs as exposed by the Bower to get fine details. We also do network capture in realtime, etc., etc.. and there is another module who will be doing further computation. So that means when you are going for example on ripe.net, you will be be able to get the server location identification, the percentage of resources which have been downloaded via which protocol, is it HTTP/2, HTTP/1, is it QUIC, etc., etc.
So then there is an automatic graphical analysis and visualisation of the test itself so we can look at the different timings, the locations, etc. And of course.
There are so many web metrics I won't be spending too much time on these metrics, we have the page load time, the PLT, the time you type in your address, and the page is loaded. We have this the resource timing giving us some characteristics about the download time, the request time, etc. We have the ATF which calculates the loading time of the visible portion of the web base as visible to the user without, let's say at first glance, without scrolling the page.
Then we have the paint timing to measure the web page load progression for time and the last one is a metric that we did push forward and it's being presented in the Internet QoU web shop soon, it's the loading time of each resource in the visible portion of the web page the wins we're working with the net log of the browser for each resource we get the exact deaths about the network recollect the DNS time, the response time, the requests times, etc., etc.. and then we can also extract from the browser the processing time too of each resource.
So, you have so many configureable parameters. You can use Chrome, Firefox, Safari, in classical headless mode, you can use it in private mode. You can use it with the cache being reactivated. You can tune your protocol with a we would like. Why we need to make measurements by tuning the protocol is like for example, at home, my ‑‑ from my ISP at home, the QUIC protocol is blocked so I need to be able to make measurements by asking my preferred protocol to be HTTP/2. And where will I can sues proxies, no proxies, I can play with the window size, we do see the RAM, we had use ad blockers, etc., etc., you have all these parameters which are custom eyesable.
For example, what do we get? This is a measurement for example a really short one because basically we have been making measurements since six months, we have got like 10 million sets of data and for each web browsing session we retrieve 61 parameters, so this is a very short, let's say, extract. So for example, we did make a measurement on YouTube using Chrome activity in the cache, asking quick protocol, wind size 120 and 80, for example in this window size the visible portion to the end user is only 47%. And we do collect several other metrics of how are the other resources downloaded for example when you go on YouTube you ask for QUIC. Of course not all the servers, mainly the consent servers can reply in QUIC so. We have 54% of that, 46 percent in QUIC etc. Then we get the first paint, which is the first pixel on the screen. The visual loading time, the time to rendering this. What is interesting is that we can monitor it through time. Because basically when we use only the page load time we do ask ourselves, okay, what did happen during that 2 seconds? We don't really know. But right now with this tool, you can do for example how many resources? What are the resources which have been downloaded for time, the processing time at the browser, etc., etc.. and we do retrieve for example the number of domains, where are they located? Are they secure and secure etc., etc..
Regarding the trouble shooting itself. So this is a graph of YouTube, since we are making, monitoring in realtime, let's say every two minutes, every three minutes, so it's monitoring on YouTube in November 2017, the QUIC protocol was disactivated on the Google servers, and at the end of November, we did solve this thing, is that at a certain time, QUIC STARTed to kick in again. And the thing is at a certain moment we were getting P L Ts of 30 seconds, that's a lot. And in the actual world when you want really to know what is happening. It's really difficult.
But when using our tool and since we are using the HTTP archives, so the network logs from the browser, so, we did find the exact culprit which is I Y timing dotcom, and it was the connect point which was really messing around. So, this allowed us to do trouble shooting, that means if a client calls us and says okay it's not working, etc., we can tell him, okay, we do know on YouTube etc. It's a distant server etc., etc. That is missing around.
And why we need to really monitor the web browsing sessions, because it's really really complex. When looking at this drawing it seems like okay, this guy he went to different content servers all around the globe etc. But not really. In fact I'm only going on the YouTube main web page. When I'm going on the YouTube main web page, I am going to all these content servers on realtime, and we need to know, for example, I am going on S Y timing, etc., etc.. for this drawing for example, we used the Chrome browser with QUIC protocol and we trained for example you can upload a profile of a use, we trained a profile of a user for example who watched a lot of videos on YouTube and automatically the advertisement, we have some Netflix advertisement so at first we go to the US and then we come back to Dublin to get the contents etc.
So this helps to let's say better monitor the Internet itself, the web browsing sessions.
So, what we are doing here is to propose this tool to the community as a guy one time, Emile Aben, he told us make measurement and share it to the community. Just give it. So that's what we are doing. So, could we integrate that tool to the RIPE anchors, of course it's in pilot phase right now. That tool is really adaptable for different parameters. You don't want to measure something, just thrash t just measure what you want. And it will be to also obtain finer details regarding IP maps. We are not only making a trace route for example on YouTube but on all the consent server when we are doing a web browsing session.
So, to end up. As I said it's not only for the RIPE anchors, so if any one of you you need the tool you want to test it, you want to play with it, just you can contact us, and I'm making measurements from France, so if we could, we all come from different locations in the world, so if we really could make measurements from different locations, it would be really nice and just share the data. And right now we are working on the identification of the don't load link of the requests from the response from the content servers.
So, let's ‑‑ what I really want is to be able to return measurements to the RIPE NCC for trouble shooting. So I don't have.an GitHub right now so if you do want to use the tool and have fun send us a mail because it is updated very regularly, we updated like two days ago, we have the new Firefox 60, etc., etc.. so if you have any questions...
CHAIR: All right. Thank you very much. Questions? Okay. So I have a question.
So, it's not on GitHub yet. There are plans?
ANTOINE SAVERIMOUTOU: If you within a the tool just send me a mail and in five minutes you'll get the tool. That's all. It's really simple.
CHAIR: So I mean, are you just basically looking to reduce the amount of overhead for dealing with requests and stuff or...
ANTOINE SAVERIMOUTOU: Right now the thing is you have so many tools, but the thing is when you want to make measurements on Firefox you have to use it, on Chrome you have to use t we make tool to measure everywhere. Right now we are collecting so much information and now let's say we have so much data, we have to measure them to find what are really the critical parameters during the web browsing session.
CHAIR: Okay. Any other questions? All right. Thank you very much.
(Applause)
CARLOS VEGA: Good evening. Good afternoon. I finished my Ph.D. on computer science in Madrid. I obtained as well, I'll just mention because of the application of my contribution. Thanks to audit and the project we have with telephonic and other companies. Today I will present one of my works regarding multiGbps HTTP traffic analysis.
As we all know, this comprises multiple tasks from the collection of the data to this section and then analysis, this is from the wire up to the IT manager, through a series of processes that require different levels of language, different volumes, and different verse tilt.
In particular, I will focus in this part today, because during my Ph.D. I faced multiple challenges in these different steps but today I will present a work I did for an HTTP dissect err for huge volumes of data.
The basic operation we're talking about the mapping of the request and the response to obtain aggregated statistics. But when we have huge amounts of data and connections, we encounter several challenges.
In the state of the art, in the development of my research, we found that the current solutions used IP and TCP connection reassembly and specific hardware, or for example multiple course to obtain 20 gigabits per second. But we wanted to achieve more performance and we had to get rid of some constraints.
So we identified a series of challenges and aims. First, we wanted to develop a tool for community hardware, just to facilitate the deployment of the tool in different scenarios in different kinds of operating systems. We wanted as well to chief 10 gig per second per core. Because when I deploy a network server you are doing multiple tasks at the same time and you have to be very efficient in the resources that you are using because maybe you are doing this kind of analysis for HTTP traffic but you are doing more tasks as well. So you have to achieve the better performance you can.
To do so we improve the speed of requests and responses of the HTTP trust anchors and also we improved the load balance techniques for achieving higher speeds because if you have more traffic, you can distribute this load between multiple instances of the tool we propose, and of course we evaluated all these in real scenarios from corporate traffic.
So, to improve the performance of the tool, we get rid of the reassembly of the underlying TCP connection matching only the first packet of the HTTP request and the first packet of the HTTP response. This is regarding the rest of the connection. This is very useful because we are talking about multiple terabytes of data. With this we can obtain aggregated statistics such as the response codes, everything. The immediate response time. And we can identify anomalies in the other processes and tasks afterwards.
To achieve a better load balance and higher speed, we proposed a different function to distribute the packets. In particular, instead of using the traditional approach of only considering the IPs from the servers and the IP from the destination and the ports, we adapt the acknowledgment number and the sequence number to obtain a more uniform and distributed has value. This avoids also the heavy hitter issues since it distributes the packet at trust anchors level instead of connection level. This is a traditional hash that multiple people uses to achieve that when you are using multiple consumers, all the packets that belong to the same connection will add up in the same consumer, so you don't lose the knowledge of the connection. But, since we don't need that knowledge, we only need the first packet of the trust anchors and the first packet of the response, we can adapt more information from the ACK number and the sequence number whether they are requests or response.
So, we set up two scenarios, one of the scenario for testing the precision of the metrics against a tool that we all know from the network management area. Another scenario to test the load balancing when we have more lines of gigabits per second. We use 500 gigabits of data from corporate networks, which is around 16 million HTTP transactions, and obtained a performance on 10 to 13 gigabits per second on a single core. We can obtain more performance by using multiple instances of the same tool.
Here, there is one limitation we found that since we are considering only the first packet of the requests and the first packet of the response, the URL of the HTTP trust anchors might be truncated, so how many URLs are truncated? We compared the too against and found that mostly URLs above 1400 charters are truncated but most of them contain query strings, this is parameters for the resource that we don't need to consider for aggregated statistics, because we want to analyse this these URLs separately no matter what parameters they have. Only the identifier of the source.
Regarding the load balance. We did an experiment between the two different hash extensions and we found that the traditional approach, the hash values gather together and are covered occurred for frequently ‑‑ with more frequency, but using our new proposed hash function, we reduced the collisions in the hash table and we improved the distribution of the packets between these ‑‑ between, in this case, two customers. In the left we have the consumer A and in the right we have the consumer B. In the second hash function, which is the one we propose, the collisions are under 10 for this 500 gigabits of traffic.
Regarding the performance, we decided to compare it with TShark because we already know the state performance. TShark is a more wide tool with higher functionality that doesn't have nothing to do with high performance tool. But it's just as we can use it for references, our tool is like 40 times faster than TShark, and uses less memory resources as well.
So this tool is being ‑‑ this work is being published in computer networks and it's available and GitHub so you can look at it.
Regarding the future, well HTTPS is ciphered and the other protocols are also more difficult to dissect, but the tendency is to use logs from the applications in corporate networks and correlate this evidence with traffic from the TCP layer etc.
So, this is all.
If there is any questions?
CHAIR: Thank you very much. Any questions?
AUDIENCE SPEAKER: Qrator Labs: So you are trying to avoiding the assembly of the TCP connection, which is something, it helps with performance. Then tell that you are matching only the first packet of connection ‑‑ of requests and response. Why?
CARLOS VEGA: Because that's what we need to MAT the trust anchors, we want to know the immediate time of the response and as well, we want to though the URL, the identifier of the resource requested and the response code for example, so we can do aggregated statistics like with resource got more server errors, or that kind of statistics. We can process huge amounts of data like, as I said, terabytes of data, you can process 2 terabytes of data in under 30 minutes. And getting aggregated statistics for further inspection of those anomalies.
AUDIENCE SPEAKER: So you don't care about user agents or ‑‑
CARLOS VEGA: You can find that sometimes in the first packet but we want to improve to get more information from further inspection, because this is like a first process. As I said this is part of a chain of monitoring, and if you obtain ‑‑ if you detect some kind of nominally, you can always go back, even to the ROA capture, it's part of a bigger process of network management.
AUDIENCE SPEAKER: Got it. Thank you.
CHAIR: Any other questions? All right. Thank you very much again.
(Applause)
DIEGO NEVES DA HORA: Thank you. I am glad to be here. We're talking about quality of experience of web browsing an quality of experience matters a lot and for different stakeholders, because if you are browsing online you are generating value for these companies, watching ads and purchasing things and if something goes wrong, you would like to at least know or these different players would like to know what's going on. But it's surprisingly difficult to measure QoE. And it turns out that you can have hints of the quality of experience at the different layers, and the three goals that the user QoE is influenced by several factors, there are context influence factor, human influence factor at level:
If you are working and you are entertainment time, or if you are in an airport at or home, from the human side you have your own expectation of quality. Your background, your experience. But here you are more focussed on the system influence factors and on the system you have some different stacks and different layers on the network, you can have ‑‑ you can measure the quality using the user stuff, you can measure latency, packet loss, if there is a Wi‑Fi link, you can get some idea of the Wi‑Fi quality as well. And the network quality is going to affect the application quality. And the application you can measure using application specific metrics, so if it's web browsing you can manager the time as we saw today, there is also a split index, if it's online video it can record the video bit rate or the number of time the video stopped for buffering. And these things are certain to influence the user experience.
Now, you could, in theory, directly measure the user, but you know, be warned, this is complicated. It's better suited for a lab study because you have to reproduce the same conditions, otherwise you won't get a valid thing. And if you are trying to do that live, you would bother the user more than getting real data.
There is actually an interesting line of work because there are engagement metrics that are in between the user layer and the application layer. So, if you are doing a live stream and suddenly people start band on a this stream, so that's an indication that something is going on. There is a QoE problem going on.
Also you can do some measurements about the context layer as well. So for example, you can have an idea of the device type, look at the user agent for example, or the location, if he is using transportation or not, if he is at home or not.
So in this talk, we're just going to talk about the very specific and narrow set, we're talking about the application quality metrics and how they correlate with the user QoE measured by MOX but that's more or less how the research in QoE goes for, it's nor Internet service.
We're going to talk about the three instant web QoS metrics and the use here is going to act an as web page. The first thing arrives is the main HTTP file. And the browsers understands the document and knows what it has to do. And then it continues to Flash the objects, and at some point the visible part of this web page is going to finish loading and we're going to define this as the default time or ATF time and eventually all the remainder or the images that are going to load, all the Java script and the CS S are going to load and then the unload err triggers, that's the PLT. Two of these are easy to measure, you can measure from the browser, but the others are a bill trick err, we have a proposal. But bear that in mind.
And there are also integral metrics, speed index is a metric that proposed to more closely relate to user experience and it's defined as the integral of the load progress, the visual load progress. So the idea is even if two pages load at the same time, but one loads a bulk of the page upfront. So the upper side of the interior is going to be very small. Whereas if it takes a while to load the bulk of the content so it's just a blank page and all of a sudden everything else appears then this area above the curve is going to be large.
It's a visual metric, it's very interesting but it's very processing intensive, cannot really ‑‑ it's very hard to instrument a computer to do that and cannot do that live. Because that's going to slow down the proposal process. And there are also three other proposals by our research group, which are object index, byte index, and image index. You extract the number of objects that finish loading or the number of images, it's lightweight. You could do that from the network itself, there is to guarantee that it really correlates with the user.
There is also recent search that say that perhaps you don't need to calculate this up to the page load time. You could clip this a bit earlier so we can have variations with a cut‑off.
So, to better proximate the load time we implemented a Chrome plug inthat uses Java script to look at the page and look at particularly the images and try to track where they are located and if it's an image and it is visible or it takes into account and if it's not, it's the load time is going not going to take those images into account. The final definition for load approximation is up on the screen. So the it's the maximum time between all the Java scripts and all the SD D files, these are critical for loading and formatting the page and the images. You think think that boast of the images are visible but they are not. This is a real page. And there are 154 images but we only track eight of them. There is a slight caveat because it'sage approximation, so for example if it's not an actual H ML image such as those highlighted in red, cannot easily track, there is a CS S trick to put these images into place, so that's a bit harder to find. But still we get good approximation. It's an Open Source extension, I'll give the link at the end.
So if you have this web quality metrics, how do you use them to estimate the QoE. Ideally you think that the QoE is just one. QoE is given by a lab study and the mapping functions you either search into the literature and you find a proposal and you see if it works for your data or you try to find some. Here I'm going to focus on the first example, but we actually tried both.
And we expect this mapping function to look more or less like this. As the QoS deteriorates as the page load time increases at some point you will start noticing and then the QoE is happily go down until it eventually the users give up and it's going to be really really small.
So, this is our dataset which is available. We did the tests with the 224 users on a set of 12 pages. And the pages were hosted on our on servers, and there is a totality of of 34 hundred reveals and it reveals the user accesses the page and gave a score between 1 and 5. And we also annotate this dataset, we took all the application QoS metrics. So, we grouped together all the page accesses for each page with similar application QoS metrics and we tried to see if these metrics predict or correlate well with the QoE. Here are the results.
So the three instant metrics, domain, PLT and AATF, and three variations time too. So, what we see here that overall, these metrics work well to predict the QoE with the exception of perhaps the download time. The PLT is what is mostly used and it works okay, but the metrics that focussed more on the visual spec, somehow take into account the images or the page load time, they tend to work better.
I'm going to take these three metrics highlighted here. And I'm going to show the three hypothesis for the mapping function. Just a naive linear, logarithmic and exponential. With you and also the ITU model which is the researches and it's a very generic model it's not fitted for this particular set. You know, although the PLT worked well, there are better options and in general, the exponential function is very generic and it works well for other metrics. It is the best for other metrics.
We also tried finance learning, we did the best we could. And we found that path machine run something overkill. You can get a very similar predictor with just one metric and just take into account several features at the same time using state of the art SE R and you could do very similar low with just one feature.
So, to summarise. The user perspective is very important. We found that the application QoS that we measure, correlate very well with the users QoE but the visual metrics show a stronger correlation. Machinery is really on paragraph with pert model. So perhaps if you could use for using the expert models.
The next steps, we would like to compare our AATF metric with the user PLT. And we are in the process of applying these on live pages because it's just a piece of Java script that can be embedded on the page itself. We are in a partnership with IP label to track this over time.
Dataset and code available on our page and this is a paper that we in‑depth discuss these metrics.
So thank you very much.
Questions?
AUDIENCE SPEAKER: Thanks for your work. So, measures what do the user really sees is really important I have got a small comment and question. The first comment is that you know, when you are using whether or when we are measuring with a PLT we have to be cautious because some websites or developers are rereclever they just put so many eye synchronised Java script in their web page, we get the PLT but from the user experience point of view nothing is loaded.
And the second question is that when, for example, you calculate the objects which are visualised in the user screen, what do you use to get the names of these objects? Is it the resource timing?
DIEGO NEVES DA HORA: Great question. First I agree. AJAX JavaScript will not work with this because it's AJAX. But the set of pages we made sure that that was not the case. We tracked the name of the images which was the source of both the HTML image on the resource timing and if we do processes on URL such as removing HTPLS and normalising to a common name, we are able to match most of the time. So, the resource timing and the image. When we didn't, I usually inspected and it was a case that the image was put in place through CSS most of the times at least.
AUDIENCE SPEAKER: Because the resource timing takes all the resources between the start and the end of the web measure which is the page load time and the thing is that when you in fact trust the name of the resource from the page load time you have this tricky part of the JavaScript which loads resources after we have got the case for Ryanair dotcom, blog dot me, so the thing is that don't really trust the resource timing, rather trust the network loads as affected by the browser, you get finer details.
DIEGO NEVES DA HORA: On our approach we were only really from the browser point of view, we were not tracking the network, but that's interesting to know.
CHAIR: Any other questions?
I actually have one. So this effect that you were addressing in the previous question, almost certainly overruns the one that I'm going to ask you about. So you have this little JavaScript widget. Have you gone calibrationings on how much sort of variation you get from that JavaScript widget?
DIEGO NEVES DA HORA: You mean in the timing measure?
CHAIR: Yes, because there have been some network stuff done from JavaScript widget the.
DIEGO NEVES DA HORA: We are trusting the resource timing API, and ‑‑
CHAIR: So they have smeared that out for spectre, like fixing spectre, like you used to get really really really good time stamps and now you get crap time stamps because it turns out the time stamps could be used fora attacks on the cache. It's in the millisecond range now, so...
DIEGO NEVES DA HORA: The thing is, when the advantage of tracking the ATF in terms of the PLT is that it is often in our experience half or even a third of the page load time.
CHAIR: Yes, so and then like the jitter here is down in the single milliseconds range, which is probably not a problem for you. Okay. Thank you very much.
(Applause)
ROBERT KISTELEKI: Thank you very much. Now something completely different. Let's stop with web browsers. I work for the RIPE NCC I'm going to talk about what we have done in the measurement space and what we are planning to do in the near future in the same space.
First, RIPE Atlas. Numbers: We have passed the 10,000 mark. It's still going up. That's good. Around 330 anchors are live and some of them are virtual, which I'm going to talk about a bit more later on. And all the numbers are up and to the right as you can imagine.
Some recent use cases. A whole bunch of you have been using RipeAtlas to figure out a thing or two about the Internet. Which makes us very, very happy I must say the first one was done by CloudFlare and they used RIPE Atlas to measure how it works and when they can reach is from and so on. All the underlines are links, so you can get more details.
Does the Internet router. There was a recent outage in DE‑CIX and we looked at what we can find. Emile did his previous analysis in a similar event. You can read the details there as well. And how I think Verizon sees the world with RIPE Atlas.
I am very, very happy that members of the community are working on tools, I think the first one was created by Stefan, the second one I'm not sure, sorry for that. But there are tools and APIs that interact with RIPE Atlas to do various things. Please please check them out. They are really useful.
More in‑house stuff. So we have been working on what we call measurement tagging and labelling which I will expand a bit more on. But basically, it helps you grouping your measurements, which is highly useful, I will tell you why a bit later.
Result archives. If you happen to be a researcher, we have built a new means of getting to the data in the form of daily dumps of all the measurement results that we get. So, it's broken down into per hour and per measurement type. So if you are interested in what kind of traceroutes have been running between 6 and 8am you just go to a particular website, and you can just download that data from us. So you no longer have to fiddle with the API. This is bulked all the results we get in that space.
We published an article about some of the internals on how we manage probes, this was highly appreciated by the IOT people because it gave some insight on how we do the upgrades and how we do the measurement the probes, there is some interesting details there, I encourage you to go and check it out as well.
RIPE Atlas time stamps. I'm not going to go into full details but we gave more information about how and when we process your results.
A feature that is about to be rolled out is kind of in the process of being old out is DNS over TLS measurements. It seems to be a hot topic now a days so we thought it's useful to do this. We don't have UI support at the moment. But we do have an API support and the probes do that as well, so, if you are interested, go and check it out.
Finally, one big chunk of work that we are working on at the moment, that's why it's not underlined, it's not a link, is that we're working on scaling our internal infrastructure to deal with ten, 100, 1,000 times as many measurements that we have today. We are using elastic search for that in‑house and it looks really good. And I certainly hope to come back to you and report on how that actually worked out for us at the next RIPE meeting.
Anchor VMs. As I said, anchors are somewhat bigger machines, rack mounted nowadays. They are probes as well as willing targets. So they use somewhat more bandwidth for this and they tend to be closer to the core and not to the edge.
In some cases, we had partners who said we would love to install RIPE NCC anchors, but cannot have a physical machine in our data centre, so that's just not going to work. So we are thinking about and we actually did some stuff about putting the anchor functionality on VMs that is provided by these partners out there. So it could be in the Cloud. We actually have one in the Amazon, whichever of the regions, and it seems to just work. We are in the pilot phase now. That's a link you can go in and read more about it.
We have five of these up and running. Thank you very much for Sander and injury add and some of the others who helped us making this happen.)
We will report on this activity soon. So we plan to close the pilot phase, report to you and then see how we go further, hopefully expanding.
And then on to probes. Due to circumstances, we had to stop delivering version 3 probes, or I should say we had to stop manufacturing new ones. There are some of them circulating with ambassadors, would he have a tiny winey stock, some of them come back, so you can still get a version 3 probe which is still up and running we'll support it for a long time but we no longer make new ones. Instead, we are experimenting with what we call the version 4 probe, I have one in my hand. This is a prototype casing for it, it's 3 did, printed. Some of you already have one, thank you for supporting us. In this phase we are trying to establish whether they are stable enough out in the field. So we are looking for volunteers who are willing to plug these in and keep them plugged in and if something goes goes wrong helping us figure out what it is and then ramp up the production of these probes in the near future.
If you want one, you can still approach Michaela tomorrow morning preferably around the info detection. She has some to give away. Not too many though, so be quick.
Measurement tagging. As I said it allows you to group your measurements. Now, what that means is, from now on you can say that this particular measurement and that particular measurement belong together because in some semantic way they form a logical group. For example, if you change IP address of your server, then used to measure the old one new you are measuring the new one but you still want to have data for the whole thing, you want to visualise it as one, you can do that or at least you can almost do that completely. For now you can tag your measurements, you can search for tags, you can stop measurements belonging into the same group, so you can treat them as one to some extent. This also comes handy for the community to tag various measurements with kind of, let me say, well known labels. So from now on you can just say this measurement is for Google. That measurement is for Facebook. So if anyone is looking for measurements against Google, they can just look for the tags and download data based on that, the system internally translates that into however many measurements that is.
You also have a means of keeping this private. So my Google measurement is not necessarily your Google measurement. That's the difference between tags and labels. If you go and read the documentation, it will explain it a bit more clearly. But the point is, that I can group the measurements in a way that is a group for myself and no one else needs to know about it. So that's probably useful in many cases.
And we're working on upgrading the visualisation to say actually take this into account so if you are using latency MON for example, your graph will no longer stop when your measurement is stopped. The group will continue and the visualisation will just do the right thing.
Okay, moving on to RIPE IP map. This used to be called Open IP Map but due to trademark issues, we had to rethink that. And we figure that it's best that we call it RIPE IP map instead. So from now on IP map.ripe.net is the place to go.
This is a small example of advisealisation of a trace route measurement. You have seen, I hope you have seen previous instances of this tool. It really is visual. It tries to show you how the packets fly by around the globe, and it still has the features that we wanted it to have, so, in particular, crowdsource input it one of the big inputs there where we encourage you to tell us where you think those nodes are. So, if you have happen to spot a problem in our dataset or you see a node that is not Joe located yet but you have aned in why it is you can contribute that to the system and the) next user will see that result and hopefully have a better visualisation of where things are in terms of the infrastructure, I stress that, this is for infrastructure. We are not aiming for geolocating eyeballs, that's for other companies to do.
RIPE Stat. Lots of things happened there. We have a review Looking Glass widget because RIS, in the background, has been renewed. So now it's more realtime, it's scaleable, and we had to follow with RIPE Stat as well. So check out that widget. It still does what it used to do but based on the new schema.
Zonemaster. Or the DNS check widget, Zonemaster is now the back end for it so it's no long he err doing its old thing. Annand gave a short update about our switch to Zonemaster in the DNS Working Group. RIPE Stat is following that lead as well. We also renewed the historical WHOIS widget. It was called historical RIPE DB i believe, now historical WHOIS. It is totally based on the RIPE database API. It's basically a visualisation of the RIPE database data that you can access via that as well.
Upstream visibility. I have give a shout out to roman threat who gave us a big hand in this one, basically this is most of their work, but we also collaborated with them). It is basically a widget that shows the variance of upstreams for a particular AS over time. So, as some you say and I totally believe that, boring is good, so what you want to see here is flat lines usually. If you see lots of fluctuations that means lots of things have changed in the routing space and it's not necessarily always good. But you can come back to this widget now and check if that is the case or not.
This has been released some days ago, so, please give us feedback if you think it's good or not.
Country reports. We are expanding RIPE Stat to give you more data about countries in particular. And now we have more regions, some lines about France in the bottom as well.
Scaleability is one of the next challenges that we face in RIPE Stat. As you can see, around mid‑2016, there is some increase in usage. From like 1 or 2 million queries a day to 56, 55 million queries a day. That's a lot of queries. And I got to say it's difficult to catch up with that so one of the focus points for the stat people will be to solve this problem to let us go further. I'm told that some application, some kind of mobile AP integrated RIPE Stat into itself and that just gives us loads, which is good.
And finally, if you want to follow what's happening in RIPE Stat then I encourage you to subscribe to this tweet, Twitter account and Christian is giving lots of news about that.
Finally, end‑to‑end user connections. This is newish. An old idea in a new form two of our colleagues have been putting a lot of energy in it. What it does is, it uses RIPE Atlas probes to measure the end‑to‑end connectivity in a particular country. So, it uses all the public probes in the country and tries to measure towards all the other public probes in the country, or most of them.
We call them sketches because this is not complete information but as much as we can determine with RIPE Atlas. We are using datasets from Atlas itself, RIPE Stat, CAIDA and APNIC, in particular you will see that APNIC is contributing the proportions of users in a particular country. So we are very grateful to use that dataset. These are highly useful links. I encourage to you check them out. The first one will let you select your country even if you want, to it's not just limited to France. You can entry your country and your day. The second one goes to the almost most recent French results. And the third one takes you to the source.
So, just to flesh it up there, what you can find. This is France as of March, 1 March. Around the perimeter of the doughnut, or circle, whichever you want to see, you can see the provider in that country with the eyeball providers. Some non eyeball providers are also categorised as eyeball providers because they may have proxys in their network and it's really kind of difficult to determine whether that's the case or not. So we just say well those are eyeball providers in the network.
The size is proportional to the market share, as measured by APNIC, so we got to say that it's ballpark correct, it's not necessarily precise but it's good enough. And the lines show if there are direct relationships between those entities. So for example, here you can see that these two guys are connected. Now that's a dashed line so they are not directly connected, there are some hops in between them which we couldn't identify. But, since we don't know any better, we say that they actually have a direct relationship. Whereas, some of these lines go through IXPs for example or some kind of transit network.
I couldn't go into all of the details, so I'm sure thattey immediately and I can't say per will publish more the documentation as well.
With this I will take questions.)
CHAIR: Thank you very much. Any questions?
AUDIENCE SPEAKER: Andreiious. You have mentioned angers in the form of VMs, are there any plans for regular probes as VMs?
ROBERT KISTELEKI: The answer is almost. We are looking into making software probes, which in the ideal case could be software packages that you can deploy on your router or server or whatever. Which is essentially the same thing. I'm not making promise at the moment, but we are looking into this and we are planning to partner up with people who can help us with the packaging.
AUDIENCE SPEAKER: Alex Andrei. RIPE Stat data is available since 2003. Something like this.) Are there any plans to integrate historical data because I think it should be available at least ten years before?
ROBERT KISTELEKI: Which dataset of that?
AUDIENCE SPEAKER: I think ‑‑ well I show you requests, but it seems information about autonomous system numbers.
ROBERT KISTELEKI: Generally speaking, we are trying to go as far back as possible in history. If you say that there is more history that we should take into account, then by all means please contact us and we'll see what we can do about it.
AUDIENCE SPEAKER: Okay.
AUDIENCE SPEAKER: Hello Simon from France IX. A question regarding the anchor VMs. Can we expect in the future to replace physical server by a VM?
ROBERT KISTELEKI: That's a very good question. The point for the VMs is not to replace existing VMs but instead to compliment them. However, it is an entirely plausible future that for example when your physical hardware expires, so after three years, four years, around that, once we see that the VMs are just good enough, then we will just say sure if you want to go for a VM go for a VM, if you want to go for hardware, go for hardware. I don't know if that's going to be the case, that also depends on the outcomes of this pilot. I personally have faith in that the VMs will work out fine. But, there is a lot of work to be done before I can make that statement clearly. So, I think yes, but I will not commit to that just yet.
CHAIR: All right. Thank you very much Robert.
(Applause)
ANDREI ROBACHEVSKY: Good afternoon. I did this presentation at the Routing Working Group. I hope ‑‑ well this is a presentation to the measurements community, I hope the overlap is not too big so I'm not boring you with this presentation again.
Anyway, I'll be quick. That's really a lightning talk, it's more a question, we were approaching a project to, with a question: How can we measure routing security in a sense how it relates to MANRS? So it's just not routing security in general but in the fact how networks implement MANRS in, well in the infrastructure. That's the main question.
And the motivation for this project is threefold. One is to answer the question, what is the actual state of routing security when it relates to MANRS, when we talk about routing security, we talk about this in terms of anecdotes and this analysis is very important to understand how attacks happen, what vulnerabilities we have and how can we mitigate this, but this unfortunately doesn't tell us about how the system is being developed. We had this discussion with a few of us and some of said that well we have this problem for 20 years and nothing happened. Is it true that nothing happened or become are we becoming more secure?
Another thing is that this MANRS effort and one other thing is it's not clear whether people on the list, on the MANRS members there, to what extent they are committed. They might have been committed when they joined this effort. Are they still committed? So, this is an important question that is important for the reputation of this effort and therefore data would help.
And finally, well not everyone knows, but if you want to join MANRS, you have to pass certain tests and we check, we do have check and balances. But those checks are manual. They are not very comprehensive and they are not consistent, so, with this data, we hope to make those checks as well.
And well from this sort of objective and from this motivation, there are two things that come to mind. It should be transparent so there should be no hidden things, the sort of credibility of those measurements should be supported by transparency and another one, they should be passive. So, it doesn't require cooperation from networks especially if we want to measure routing security or insecurity on a global scale.
So, as I said, it relates to MANRS actions. Here is a list of MANRS actions against which we plan to measure routing security.
Now the question is, and this is a question for this community, it's more sort of asking you for advice and feedback and if you know about similar projects, please talk to me, I really appreciate your feedback.
I'll concentrate basically on just routing stuff because the other tough is easier in a way. So, with routing, when it comes to routing and with those sort of high level requirements, it should be transparent and passive. Obvious thing that comes to mind, you can sort of look at the changes in the routing system and identify events related to possible hijacks and possible route leaks, you can also easily check on Bogon announcements, announcements of Bogon address blocks and Bogon autonomous system numbers. Those things you can check. I don't know if you have any ideas that other sort of metrics would be helpful here to identify how sort of the posture, the security posture of a particular network, that would be helpful as well.
But, if you look at those metrics, the question is how you calculate them. That's another thing you have a few choices. And one of the choice Yours sincerely you can look at the impact. So, event happens, probably an incident, and what is the impacted created on the routing system. So if you look from the impact point of view there are several questions that are very, very difficult to answer, because not all prefixes are equal, right, and size doesn't necessarily matter all the time. It really depends on particular analysis in a particular case.
So, it's very hard to sort ever identify this metric and make is a measurable and especially define certain thresholds say well this is acceptable, this is not acceptable, this kind of stuff.
So, right now we are thinking not to go this way, no the to look at the impact. Besides the objective of this effort is really to look at the sort of security posture related to conformance and conformity is, it doesn't matter if you announce the big prefix, or hijack the big prefix, or small prefix, you leaked the stuff and that means that you're not ‑‑ you do not have controls in place, right.
So you are non conformant in a way.
So we're going down this road and it's easy in a way. It's sort of easier to define the thresholds, and of course resolution time is very important, because the quicker you react the more sort of less havoc or less pain you incur on the rest of the Internet and the more sort of security processes you have in place.
Now, another sort of dimension to look at this is, I mentioned events, right, you have events and if you look at monitoring tools like BGP MON for instance, you have a bunch of distinct event which in fact if you look deeper they CONS institute of one incident. One fault, one error that caused subsequent vents and those events may be seen as different events because they involved different prefixes, maybe different ASes, a different AS paths, they propagated in the Internet to the route collectors in different time, and therefore they are seen by this monitoring system as a distinct events. So it doesn't make sense to count them as distinct event because they indicate one single fault. And therefore, we were thinking how to combine them and there is some ideas. I'll show in the eye gramme.
Also another idea is to sort of dump the penalty for this incidence depending on how far from you the culprit is. If it's your direct customer which you have more control over, you get the full blame. If it's further away, hops away, then less of that.
And finally, defining the ‑‑ if it's Prins those pretty arbitrary numbers but we need to run and see how it plays out. If you react quickly within the 30 minute window, that probably means that your market list operates well and can coordinate sort of mitigation of those incidents, you get sort of half of the weight. And if you are not acting, that is penalised.
This is the diagram how those events are combined in the incidents and weighted.
So, the first one is reacted very quickly. You get 0.5, those four events, they they are actually one incident which lasted less than 24 hours, you get 1 for that. And those on the right, those three events, they are considered one incident again and it lasted more than 24 hours.
So, I think that's it. Thank you. And as I said, it's more sort of an announcement to ask for help and advice.
CHAIR: Thank you very much.
AUDIENCE SPEAKER: Daniel Karrenberg, also routing researcher. That's very interesting, and thank you for seeking advice. I was wondering how you define resolution. And let me explain why. Because if you define resolution by basically the prefix you don't own goes ‑‑ or the lonely announced prefix goes away, the cause of it might not be a resolution but basically that the spoofer stopped, and what we see in practice that some of this is really only there for five or ten minutes. And it repeatedly is there. So, if you want to sort of grade ASes, I think you have to be not only take into account incidents and their duration, but also the frequency of the incidents themselves, and it's really dangerous to assume that if an incident stopped that it was due to mitigation. So that would be my advice. Kiss
ANDREI ROBACHEVSKY: You are right. I mean the indication that you have subsequent incidents actually means that you are negligent, right, even if you resolve those incidents in very quick time.
DANIEL KARRENBERG: That should probably be penalised.
AUDIENCE SPEAKER: Randy Bush: I suggest looking into active measurements. I suggest looking into active measurements.
ANDREI ROBACHEVSKY: Can you elaborate?
RANDY BUSH: Yeah, once we have done the IMC submit Friday next Friday. But you can announce spoof, you can arrange hijacks with the consent of the hijackey, etc.
ANDREI ROBACHEVSKY: So, I'll come to you next week when you publish the IMC paper.
RANDY BUSH: After next week.
CHAIR: The deadline is assert morning Saturday morning at 2 a.m..
ANDREI ROBACHEVSKY: I think that's a way to go.
RANDY BUSH: And that's pretty ‑‑ when you have got one of those, you have got ‑‑
ANDREI ROBACHEVSKY: True.
RANDY BUSH: Got them by the neck.
ANDREI ROBACHEVSKY: I think for MANRS member that would apply. I cannot see how that can be applied on a global scale though.
CHAIR: I have a question. I'm going to try a put a little bit of the work back in Working Group. Is there a document that you have that's a working document of this? Did that go to the list?
ANDREI ROBACHEVSKY: Not yet. So the idea was I collect feedback here, update this document and send this to the Working Group.
CHAIR: Please do that because I think I'd like to continue this discussion not only in the hallways here but also out on the list and, so in Amsterdam we can come back to it and have a look at sort of prototype measurements adds they are running.
All right. If there are no other questions. Thank you very much.
ANDREI ROBACHEVSKY: Thank you.
(Applause)
CHAIR: We have got I think four minutes for any other business. Is there any other business? No? Okay. Thank you very much. Just give you three and a half minutes of your time back. Enjoy dinner and the rest of the meeting. Thanks a lot.
And don't forget to rate the talks.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.