diff --git a/doc/repport.tex b/doc/repport.tex
index c7d0262a2668aa22d30c38c4e75e4e1cf591a6d9..f5913331a0351c1c1b74aa82cac54eb74f98f849 100644
--- a/doc/repport.tex
+++ b/doc/repport.tex
@@ -16,44 +16,47 @@
 %% pkg 
 \usepackage{graphicx}
 \usepackage{wrapfig}
+\usepackage{float}
 \graphicspath{ {./images/} }
 \usepackage[utf8]{inputenc}
 \usepackage[document]{ragged2e}
 
 \begin{document}
+\justify
 
 \title{Measuring the performance of distributed web services from geographically dispersed clients}
 
 \author{Axel Gard}
 \email{axega544@student.liu.se}
-\affiliation{%
+\affiliation{
   \institution{Link{\"o}ping University}
-  \city{Link{\"o}ping}
-  \country{Sweden}
+  %\city{Link{\"o}ping}
+  %\country{Sweden}
 }
 
 \author{Martin Gustafsson}
 \email{margu424@student.liu.se}
-\affiliation{%
+\affiliation{
   \institution{Link{\"o}ping University}
-  \city{Link{\"o}ping}
-  \country{Sweden}
+  %\city{Link{\"o}ping}
+  %\country{Sweden}
 }
 
 \author{Joseph Hughes}
 \email{joshu135@student.liu.se}
-\affiliation{%
+\affiliation{
   \institution{Link{\"o}ping University}
-  \city{Link{\"o}ping}
-  \country{Sweden}
+  %\city{Link{\"o}ping}
+  %\country{Sweden}
 }
 
 \author{Axel Wretman}
 \email{axewr193@student.liu.se}
-\affiliation{%
+\affiliation{
  \institution{Link{\"o}ping University}
- \city{Link{\"o}ping}
- \country{Sweden}}
+ %\city{Link{\"o}ping}
+ %\country{Sweden}
+}
 
 %%
 %% Abstract:  
@@ -63,79 +66,178 @@
 %% 4) What are the implications/value of your results/findings? 
 %%
 \begin{abstract}
-Since modern web services tend to serve a worldwide audience, an increasing amount are choosing to distribute them among data centers across the globe. It is therefore important to be able to test the performance of these services from different locations and under varying conditions. We set out to create a test suite to do just that, to measure the performance of distributed web services from geographically dispersed clients. The study finds that client location has a material impact on performance, despite using a distributed model, and that clients in Europe and North America are generally more performant than clients in Asia, South America and the Middle East.
+
+Since modern web services tend to serve a worldwide audience, an increasing amount are choosing to distribute them among data centers across the globe. It is therefore important to be able to test the performance of these services from different locations and under varying conditions. We set out to create a test suite to do just that, to measure the performance of distributed web services from geographically dispersed clients. The study finds that client location is one of the many factors that contribute to the performance of a web service. The study also find that in general, clients in Europe and Asia have a better reply time when compared to clients located in central Canada, South American, and along the North American east coast. However the study concludes that these results are not conclusive and that further research is needed before a final conclusion can be drawn.
 \end{abstract}
 
 \maketitle
-\tableofcontents
-\clearpage
+%\tableofcontents
+%\clearpage
 
 \section{Introduction}
-A major problem of centralized web services is being able to offer good performance to connections originating from a wide range of geographic locations. This is among the reasons that many web services today are choosing to adopt a distributed model by implementing it themselves or by deploying the service on \textit{Infrastructure as a Service} providers such as Amazon Web Services and Microsoft Azure.\\[10pt]
+A major problem of centralized web services is being able to offer good performance to connections originating from a wide range of geographic locations. This is among the reasons that many web services today are choosing to adopt a distributed model by implementing it themselves or by deploying the service on \textit{Infrastructure as a Service} providers such as Amazon Web Services (AWS) and Microsoft Azure.\bigskip
 
-The purpose of this paper is to provide a methodology for testing the performance of distributed web services from multiple geographical locations. This is done by developing a test suite to be deployed on data centers dispersed across the globe. The different instances of the suite then connect to a specified web service and measure the performance of its connection. The results are then compiled and compared which lets us observe how the performance varies by location. The relevant metrics are data throughput and round-trip time (RTT) and error rate, depending on overall network activity and some client-server distance heuristic.\\[10pt]
+The purpose of this paper is to provide a methodology for testing the performance of distributed web services from multiple geographical locations. This is done by developing a test suite to be deployed on data centers dispersed across the globe. The different instances of the suite then connect to a specified web service and measure the performance of its connection. The results are then compiled and compared which lets us observe how the performance varies by location. The relevant metrics are data throughput and reply time and error rate, depending on some client-server distance heuristic.\bigskip
 
-In this paper, we show that the performance of distributed web services are highly dependent on the location of the client, with poor performance identified for clients in sparsely populated areas and/or areas with poor network infrastructure. We also conclude that for as long as the client stays inside Europe and North America the performance of the web service is negligible as it is within the margin of error. However, when we test the performance of these same services from either Asia, South America or the Middle East we find notably varying results due to large differences in network infrastructure and population density. Furthermore we conclude that as the web service grows in size the location of the client becomes less relevant to the performance of the web service, this may be because larger web services have more servers in close proximity to the client. reducing the distance between the client and the server.\\[10pt]
+In this paper, we show that the performance of distributed web services are highly dependent on the location of the client, with poor performance identified for clients in sparsely populated areas and/or areas with poor network infrastructure. We also conclude that for as long as the client stays inside Europe and North America the performance of the web service is negligible as it is within the margin of error. However, when we test the performance of these same services from either Asia, South America or the Middle East we find notably varying results due to large differences in network infrastructure and population density. Furthermore we conclude that as the web service grows in size the location of the client becomes less relevant to the performance of the web service, this may be because larger web services have more servers in close proximity to the client. reducing the distance between the client and the server.\bigskip
 
 \subsection{Structure of the paper}
-The remainder of this paper is structured as follows, in section \ref{section:methodology} we present our strategy for how we created our test suite, what experiments we did and how those experiments were conducted. Further more in section \ref{section:results} we present the findings of those experiments and how those experiments can be examined. Then during section \ref{section:conclusions} we look at what conclusions can be drawn from the results presented earlier and consider the implications of the results and the conclusions, both to individual clients but also to society at large. 
-
-\section{Hypothesis} 
-\label{section:Hypothesis}
-We expect to find a material correlation between a client's geographical location and a distributed service's network performance. We also expect that round-trip time and jitter suffer as a result of increased an increased geographical distance between a given client and server simply due to the velocity that the data is physically transmitted at. Meanwhile, throughput and error rate are dependant on a weakest link, with the likelihood of throughput throttling or packet drops increasing as more links are introduced into the chain. \\[10pt]
-
-While it is reasonable to assume that the distribution of web services alleviates the issue of locality, we do not think that it solves the problem completely. In some cases, locations with a low population density may not have an available data center nearby for the web service to be hosted on. In other cases a web server may be hosted on a strong network that is physically close to a client, yet still suffer major bottlenecking on the last mile due to poor network infrastructure. We expect to find that the web services that serve large-scale international userbases are less sensitive to differences in its user's geographical location.
-
+The remainder of this paper is structured as follows, in section \ref{section:methodology} we present our strategy for how we created our test suite, what experiments we did and how those experiments were conducted. Further more in section \ref{section:results} we present the findings of those experiments and how those experiments can be examined. Then during section \ref{section:conclusion} we look at what conclusions can be drawn from the results presented earlier and consider the implications of the results and the conclusions, both to individual clients but also to society at large. 
 
 \section{methodology} 
 \label{section:methodology}
-To design a test suite that would be able to capture the metrics needed to measure the performance of modern distributed web services, we needed multiple clients that could connect to a service's distributed servers. We therefore had to design distributed network of test clients that could collect the desired measurements. This presented a variety of design challenges that meant bringing together a number of different tools and services to accomplish this task.
+To design a test suite that would be able to capture the metrics needed to measure the performance of modern distributed web services, we needed multiple clients that could connect to a service's distributed servers. We therefore had to design a distributed network of test clients that could collect the desired measurements. This presented a variety of design challenges that meant bringing together a number of different tools and services to accomplish this task.
 
 \subsection{Clients}
-Being interested in the importance of the geographical locations of server and client in the use of the target service it was required of our test suite to be distributed across multiple servers around the world. The easiest and most affordable way to build such a structure was with the use of Amazon's AWS EC2 instances. We decided to use Linux hosts running Ubuntu Server 16.04 LTS and being of instance type t3.micro. This meant that each client had two virtual CPUs of type Intel Skylake P-8175 running at a clock frequency of 2.5 GHz, one GiB of memory and up to five Gigabit of network bandwidth. \\[10pt]
+Being interested in the importance of the geographical locations of server and client in the use of the target service it was required of our test suite to be distributed across multiple servers around the world. We chose to achieve this structure with the use of Amazon's AWS EC2 instances, which is their general purpose compute instance. We decided to use Linux hosts running Ubuntu Server version 16.04 LTS and being of instance type t3.micro. This meant that each client had two virtual CPUs of type Intel Skylake P-8175 running at a clock frequency of 2.5 GHz, one GiB of memory and up to five Gigabit of network bandwidth. Since our clients use powerful hardware and a strong network connection, the client itself should not introduce any bottlenecks. \bigskip
 
-Amazon also offers their Software Development Kit (SDK) as a means for developers to interact with EC2 instances to both start and stop instances programmatically. With their Software Development Kit it is also possible to run commands on each instance from any computer with the right credentials meaning that we could start run a script on each instance from a centralized computer. The scripts could then act as our distributed test clients and collect measurements before returning their result to a central host where the results are stored and processed.
+Amazon also offers their Software Development Kit (SDK) as a means for developers to interact with EC2 instances to both start and stop instances programmatically. With their Software Development Kit it is also possible to run commands on each instance from any computer with the right credentials meaning that we could start a script on each instance from a centralized computer. The scripts could then act as our distributed test clients and collect measurements before returning their result to a central host where the results are stored and processed.
 
 \subsection{Scripts}
-The scripts running on the EC2 servers were not required to be especially complex in and of themselves. They needed to be remotely started from a central client, then perform some network tasks that returned relevant data, before sending said data back to the original client that launched the operation. Before deciding on what language should be used for these scripts we first focused on what tools were capable of performing the required tests. One of the first such tools that we found was httperf. A widely used and robust network measurement tool that is capable of performing benchmarks on both server and network performances \cite{MDJ98}. This together with the tool ipinfo that can be used to find the geographical location of a given IP-address gave us all the measurement data that we required. \\[10pt]
+The scripts running on the EC2 servers were not required to be especially complex in and of themselves. They needed to be remotely started from a central client, then perform some network tasks that returned relevant data, before sending said data back to the original client that launched the operation. Before deciding on what language should be used for these scripts we first focused on what tools were capable of performing the required tests. One of the first such tools that we found was httperf. A widely used and robust network measurement tool that is capable of performing benchmarks on both server and network performances \cite{MDJ98}. This together with the tool ipinfo that can be used to find the geographical location of a given IP-address gave us all the measurement data that we required. \bigskip
 
-Both of these tools have libraries for the Python programming language. Python also has an extensive collection of other useful networking libraries and has native support on the AWS instances that we use as our test clients. All of the members of the research group was also highly familiar with the language making it the strongest candidate and the language that we finally decided on using for our scripts. \\[10pt]
+Both of these tools have libraries for the Python programming language. Python also has an extensive collection of other useful networking libraries and has native support on the AWS instances that we use as our test clients. All of the members of the research group was also highly familiar with the language making it the strongest candidate and the language that we finally decided on using for our scripts. \bigskip
 
-To fulfill our capability requirements we needed two separate scripts. One base script that could be started on a central base client and one that could execute on all of our remote testing clients. The role of the base script was to both initiate and manage the connections to our test clients. Upon completion of our benchmarks it also needed to extract the collected data from the remote clients so that they could be calculated upon and then be neatly compiled on the base client. As mentioned previously this script uses the AWS SDK to interact with the clients, so naturally the first thing that is done is to start the clients and initialize them. After the clients are started we fetch any new changes to the remote client script and install all the necessary requirements. Then we run the remote client script on each client and wait for the results to be compiled. Once the results are in from each client we can then power these off as they have now completed their part of the work. \\[10pt]
+To fulfill our capability requirements we needed two separate scripts. One base script that could be started on a central base client and one that could execute on all of our remote testing clients. The role of the base script was to both initiate and manage the connections to our test clients. Upon completion of our benchmarks it also needed to extract the collected data from the remote clients so that they could be calculated upon and then be neatly compiled on the base client. As mentioned previously this script uses the AWS SDK to interact with the clients, so naturally the first thing that is done is to start the clients and initialize them. After the clients are started we fetch any new changes to the remote client script and install all the necessary requirements. Then we run the remote client script on each client and wait for the results to be compiled. Once the results are in from each client we can then power these off as they have now completed their part of the work. \bigskip
 
-So now that we have all the results we go through each client separately and create a graph were we plot achieved throughput as a function of number of calls per second to the server. We then compare graphs from different clients to the same server in order to see if client location has an impact on the achieved throughput. We also plot the percentage of calls that fail as a function of number of calls per second as well and we should see that as throughput peaks and starts dropping the error percentage should increase as more and more connections time out. We then again compare this between different client locations to see if there is a difference in error rate between different client locations. We also want to plot response time as a function of number of calls per second in order to find out if response time increases at the same rate when comparing different client locations but also if the response time levels out at the same level between different client locations. \\[10pt]
+So now that we have all the results we go through each client separately and create a graph were we plot achieved throughput as a function of number of calls per second to the server. We then compare graphs from different clients to the same server in order to see if client location has an impact on the achieved throughput. We also plot the percentage of calls that fail as a function of number of calls per second as well and we should see that as throughput peaks and starts dropping the error percentage should increase as more and more connections time out. We then again compare this between different client locations to see if there is a difference in error rate between different client locations. We also want to plot response time as a function of number of calls per second in order to find out if response time increases at the same rate when comparing different client locations but also if the response time levels out at the same level between different client locations. \bigskip
 
-The script running on the remote testing clients needed to only do three things. Perform network tests towards a target service, do the required calculations on the resulting data and save the final results to a file on the system. As previously mentioned the Python library for httperf did most of the heavy lifting when it came to collecting measurements. What the script does is perform tests were the numbers of requests per second is increased linearly until we see the achieved throughput start to drop off. We then compile the results of the tests into a file which we can then access from the base script in order to the compile the results in the graphs mentioned above. \\[10pt]
+The script running on the remote testing clients needed to only do three things. Perform network tests towards a target service, do the required calculations on the resulting data and save the final results to a file on the system. As previously mentioned the Python library for httperf did most of the heavy lifting when it came to collecting measurements. What the script does is perform tests were the numbers of requests per second is increased until it reaches a maximum. We then compile the results of the tests into a file which we can then access from the base script in order to the compile the results in the graphs mentioned above. \bigskip
 
 \subsection{Measurements}
-Since the majority of modern services have servers across the globe that are all considered high performance, the differing factor when changing the geographical location of the client will be the performance of the network between the client and the server rather than then performance of the server itself. We therefore chose metrics that would give us insight into this network performance. The key measurements collected by our test clients were the throughput of the network, the round-trip time (RTT) of communications with the server, the latency deviation or jitter in the communications as well as the error rate. \\[10pt]
+Since the majority of modern services have servers across the globe that are all considered high performance, the differing factor when changing the geographical location of the client will be the performance of the network between the client and the server rather than then performance of the server itself. We therefore chose metrics that would give us insight into this network performance. The key measurements collected by our test clients were the reply time and their error rate of requests, dependent on the amount of requests sent per second and the client-server distance. \bigskip
+
+The requests that httperf sends are very simple HTTP requests, meaning that it measures how many requests a server can handle, as opposed to simply transferring large files. Therefore, the request rate, or the amount of requests sent per second is a key variable that is changed across measurements. The reply time is simply the time it takes from sending the first request to getting the first reply back. An error, as referred to in this methodology is a blanket term for issues such as client or socket timeout, refused or reset connections, DNS issues, and more. The error rate is then the percentage of connections that produce an error.\bigskip
+
+The throughput of the network describes the volume of data that can be streamed from one end to the other across the network during a period of time. It is dependant on the weakest link in the connection between the client and the server. This is because the higher performing nodes can't push information at a higher rate than the information can traverse through the rest of the connection without resulting in an increased error rate. Although we did not measure network throughput directly, it can be derived from the data gathered on request rate, reply time and error rate. When sending few requests per second, the reply time tends to be low and the error rate low or non-existent. As the request rate is increased, so does usually the reply time until a certain threshold is reached. When the request rate reached the threshold, the error rate and reply time spike upwards, indicating that the connection cannot handle a higher throughput.\bigskip
+
+%sThe time it takes for data to get transported from one end of the connection to the other is called latency. Our last measurement, latency deviation, describes just how much this latency changes between each packet that is sent over the network. Ideally the connection should have a consistent latency that can be predicted and used when configuring network nodes or handle a stream of multiple data packets. A high latency deviation causes even streams of data to become lumpy and uneven. A unexpected increase in delay for data to reach its destination can also cause issues if the destination node is configured to wait a specific amount of time. The deviation can make the latency exceed this waiting time causing the loss of the sent information.
+
+\subsection{Tested services and clients}
+Our results include measurements made on the following web services.
+\begin{itemize}
+    \item google.com
+    \item wikipedia.org
+    \item skatteverket.se
+    \item bbc.co.uk
+    \item grab.com
+\end{itemize}
+\bigskip
+
+The clients used in the measurements were hosted on the following AWS servers.
+\begin{itemize}
+    \item Southern Middle East (Bahrain)
+    \item Eastern South America (São Paulo)
+    \item Southern Africa (Cape Town)
+    \item Central Canada (Central)
+    \item Eastern US (N. Virginia)
+    \item Western US (N. California)
+    \item Central Asia Pacific (Hong Kong)
+    \item Northern Asia Pacific (Tokyo)
+    \item Southern Asia Pacific (Singapore)
+    \item Southern Europe (Milan)
+    \item Northern Europe (Stockholm)
+    \item Western Europe (Paris)
+\end{itemize}
 
-The throughput of the network describes the volume of data that can be streamed from one end to the other across the network during a period of time. It is dependant on the weakest link in the connection between the client and the server. This is because the more high preforming links can't push information at a higher rate than the information can traverse through the rest of the connection without resulting in an increased error rate. With errors being a blanket term for issues such as client or socket timeout, refused or reset connections, DNS issues, and more.\\[10pt]
+\subsection{Limitations}
+Due to that we are using AWS as a way of testing in different parts of the world the results that we acquired will have some differences when compared to if we would have been a more common client. For example if a service is also using AWS we could possible be prioritised over the network compered to if a client was on a non-AWS network. This means that some results might differ for some services that is also using AWS. It also means that we're incapable of testing from remote locations with sparse network infrastructure since most AWS servers are located in more busy network regions around the world. Another issue is that a lot of web services are themselves hosted on AWS. When measuring such services, the result does not reflect a real client, who might be close to the data center, but usually not inside the data center.\bigskip 
 
-The RTT of the network describes the total time it takes between one node on the network sending a request to when it gets a response from the node that the request was meant to reach. In other words, in our case the RTT is the sum of the time it takes for information to reach the server from our client, the time the server takes to process the request and send a response, and the time again that it takes the response to reach the client from the server. \\[10pt]
+In the tests we were also limited by the tools that we used and the correlations by measurement's that they produced. For example we used IpInfo in order to get distance between the service and the client. However the geographical distance can be differing quit heavily compared to the psychical distance in the network. There may also be some issues in how we utilize httperf. The request rate was at times arbitrarily limited by httperf, meaning that we sometimes get an actual rate of only 5 000 requests per second when asking for 10 000. This lead to some inconclusive data. Since we could not measure throughput directly, it has to be inferred from other measurement, meaning that we do not have an exact value of the network throughput, but rather a threshold were it levels off. So in that way we are limited by the tools that we used. \bigskip
 
-The time it takes for data to get transported from one end of the connection to the other is called latency. Our last measurement, latency deviation, describes just how much this latency changes between each packet that is sent over the network. Ideally the connection should have a consistent latency that can be predicted and used when configuring network nodes or handle a stream of multiple data packets. A high latency deviation causes even streams of data to become lumpy and uneven. A unexpected increase in delay for data to reach its destination can also cause issues if the destination node is configured to wait a specific amount of time. The deviation can make the latency exceed this waiting time causing the loss of the sent information.
+The over all time span of the testing could of course also have impact on the results of the tests. Due to that we ran our test over a shorter period of time in Q2 of 2020 we might have differing results compared to if the test would have run for a longer period of time. Of course it is also a differing factor that we only ran the test in Q2. We might have gotten somewhat differing results if the test would have run in a other part of the year. \bigskip
 
-%\subsection{Tested services}
+\emph{(It should also be noted that in 2020 Q2 there was a world wide pandemic of COVID-19 and due to the lock-down of most country's, a lot of people were constraint to there homes and used the internet more then normally this meant that there was a heaver strain of the networks then if there would not have been an pandemic)}
 
-\subsection{Limitations}
-Due to that we are using AWS as a way of testing in different parts of the world the results that we acquired will have some differences when compared to if we would have been a more common client. For example if a service is also using AWS we could possible be prioritised over the network compered to if a client was on a non-AWS network. This means that some results might differ for some services that is also using AWS.\\[10pt] 
+\section{results}
+\label{section:results}
 
-In the tests we were also limited by the tools that we used and the correlations by measurement's that they produced. For example we used IpInfo in order to get distance between the service and the client. However the geographical distance can be differing quit heavily compered to the psychical distance in the network. So in that way we are limited by the tools that we used. \\[10pt]
+\subsection{Non-distributed and semi-distributed services}
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/www.skatteverket.se_distance.png}
+\caption{\textmd{ Peak reply time [ms] from www.skatteverket.se as a function of the clients distance from the server [km].}}
+\label{skatteverket distance}
+\end{figure}
 
-The over all time span of the testing could of curse also have impact the results of the tests. Due to that we are ran our test over a shorter period of time in Q2 of 2020 we might have have differing results compared to if the test would have run for a longer period of time. Of course it is also differing factor  that we only ran the test in Q2 we might have gotten some what differing results if the test would have run in a other part of the year. \\[10pt]
+The first service we tested was the Swedish tax collection agency's website which we assumed would be locally deployed in Sweden only, meaning that it is not distributed and can therefore serve as a comparison to the other services. As we can see in Figure \ref{skatteverket distance} this hypothesis holds as we can clearly see a trend where the further away you are from the service the longer the reply time becomes. 
 
-\emph{(It should also be noted that in 2020 Q1 and Q2 there was a world wide pandemic of COVID-19 and due to the lock-down of most country's, a lot of people were constraint to there homes and used the internet more then normally this meant that there was a heaver strain of the networks then if there would not have been an pandemic)}\\[10pt]
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/www.skatteverket.se_Tokyo_cut.png}
+\caption{\textmd{ Reply rate [ms] from www.skatteverket.se as a function of request rate [requests/s] from the client in Tokyo, 8170 km away from the server in Stockholm.}}
+\label{skatteverket error tokyo}
+\end{figure}
 
-\section{results}
-\label{section:results}
+In Figure \ref{skatteverket error tokyo} we can clearly see that when the request rate exceeds 3000 requests per second the error rate increases drastically as the service starts to have a hard time answering all the request on time.
+
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/www.skatteverket.se_Stockholm.png}
+\caption{\textmd{ Reply rate [ms] from www.skatteverket.se as a function of request rate [requests/s] from the client in Stockholm, 5 km away from the server in Stockholm.}}
+\label{skatteverket error stockholm}
+\end{figure}
+
+Looking at Figure \ref{skatteverket error stockholm}, there are absolutely no errors when sending up to 10 000 requests per second between the client and the server, who are only 5 kmn away from each other. When comparing this to Figure \ref{skatteverket error tokyo}, it is apparent that Skatteverket does not have good coverage in the Northen Pacific, which is to be expected by a non distributed services whose audience is almost entirely Swedish.
+
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/www.bbc.co.uk_distance.png}
+\caption{\textmd{ Peak reply time [ms] from www.bbc.co.uk as a function of the client's distance from the server [km].}}
+\label{bbc distance}
+\end{figure}
+
+When we then tested the British Broadcasting Corporation's (BBC) website we assumed that this service would be more distributed and therefore the results for it would differ from the results in Figure \ref{skatteverket distance}. However as can be seen if Figure \ref{bbc distance} this graph is very similar to Figure \ref{skatteverket distance} and we here also see a clear trend that with further distance from the service comes an increase in reply time. The service is distributed in that it switches between a server in England and Germany, indicating that the service is only optimized for a European audience.
 
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/www.bbc.co.uk_Paris.png}
+\caption{\textmd{ Reply rate [ms] from www.bbc.co.uk as a function of request rate [requests/s] from the client in Paris, 479 km away from the server in Frankfurt.}}
+\label{bbc request eu-west}
+\end{figure}
+
+The measurements presented in Figure \ref{bbc request eu-west} show a stable connection up until 5 000 requests per second, where some errors are introduced.
+
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/www.bbc.co.uk_Tokyo.png}
+\caption{\textmd{ Reply rate [ms] from www.bbc.co.uk as a function of request rate [requests/s] from the client in Tokyo, 9562 km away from the server in Hammersmith.}}
+\label{bbc request ap-northest}
+\end{figure}
+
+The connection between BBC's server in Hammersmith and a client in Tokyo proves to be pretty problematic even at relatively low request rates, which is to be expected since the distance between them is so large. \bigskip
+
+Let us consider the difference in throughput when interfacing with the BBC website from a European location as opposed to a client in the Asia Pacific. In this case the European client is in Paris and the server in Frankfurt with a physical distance of approximately 479 km. The Asia Pacific client is located in Tokyo with the server being in London at a distance of 9 562 km. The European client handles connection below 5 000 requests/s without issue and at a close to zero latency. The measurements done at 5 000 and 7 000 requests/s then suffer a relatively minor error rate of 30 percent at a slightly higher latency. The difference is staggering when looking at the Asia Pacific client. Reply times are initially high as is expected from such a distance. What's interesting is that the error rate jumps to 30 percent at just 1 000 requests/s, increasing to almost 60 percent at 4 000 requests/s and 90 percent at 7 000 requests/s. \bigskip
+
+\subsection{Distributed services}
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/en.wikipedia.org_distance.png}
+\caption{\textmd{Peak reply time [ms] from en.wikipedia.org as a function of the client's distance from the server [km].}}
+\label{wikipedia distance}
+\end{figure}
+
+In Figure \ref{wikipedia distance} we show how the reply time from the Wikipedia service differs from different locations around the world. What can be seen is that all North American based clients are in close proximity to the server however the reply time is rather large, with the west coast having a better reply time than the other two. For our Asian clients the reply times are quite similar and very decent with the South American client also in this group. As for the South African client, the connection from here seems quite fast considering the large distance away from the server. Then we see that the European clients seem to have the best connection but Paris seems to be rather slow compared to the other two European clients. \bigskip
+
+\begin{figure}[H]
+\includegraphics[width=0.45\textwidth]{images/results/www.google.com_distance.png}
+\caption{\textmd{Peak reply time [ms] from www.google.com as a function of the client's distance from the server [km].}}
+\label{google distance}
+\end{figure}
+In Figure \ref{google distance} we see a similar pattern as in Figure \ref{wikipedia distance}. North American servers are still close but with a high reply time. The Asian servers are all clumped up in the middle but interestingly South America's reply time has sky-rocketed and stands out as the worst reply time of all clients. Our South African client is still performing very well considering the distance and the European servers still having some of the best reply times. But as before Paris stands out among the European servers as it is still a large gap between this and the other two European servers. Another thing that stands out is that the Hong Kong client is in the mix with our two best European clients. \bigskip
 
 \section{Discussion}
-\label{section:Discussion}
-\textit{method and limitations} \\[10pt]
-\textit{Results (at a high level) and their implications} \\[10pt]
-\textit{Wider context} \\[10pt]
+\label{section:discussion}
+We expected there to always be a strong correlation between a client's geographical location and a distributed service's network performance. We also expected reply time to suffer as a result of an increased geographical distance between a given client and server simply due to the velocity that the data is physically transmitted at. Meanwhile, throughput and partially error rate are dependant on a weakest link, with the likelihood of throughput throttling or packet drops increasing as more links are introduced into the chain. \bigskip
+
+While it is reasonable to assume that the distribution of web services alleviates the issue of locality, it does not seems to solve the problem completely. In some cases, locations with a low population density may not have an available data center nearby for the web service to be hosted on. In other cases a web server may be hosted on a strong network that is physically close to a client, yet still suffer from major bottlenecks on the last mile due to poor network infrastructure.\bigskip
+
+Our data set included services of varying degrees of distribution. The results in Figures \ref{skatteverket distance} and \ref{bbc distance} show that for these services there is almost a linear correlation between distance, throughput and reply time. This is what we assumed would hold true for all services as common sense would suggest that an increase in physical distance would increase the reply time as the signal needs to travel a longer distance. However Figure \ref{wikipedia distance} and \ref{google distance} seem to disprove this theory as no correlation between distance and reply time could be found among these Figures. This is very apparent when looking at the Wikipedia distance graph as the client located in northern California is in the same state as the Wikipedia server it is connecting to but still has a slower reply time when compared to the client located in Stockholm. So it seems that it is faster for the reply to cross the Atlantic ocean than there is for it to reply to a request from a client in the same state. \bigskip
+
+These results seem counter-intuitive as common sense would suggest distance to be a larger factor in determining the reply time from a service. So it is with this in mind that we take a look at some of the assumptions we made that may or may not be accurate and therefore skew our findings. IpInfo was used on each client to find the location of the server we would connect to when requesting the service. However there was no way that we could verify that this was the server we connected to when using Httperf as we would not get any information about what server responded to our request. This means that there is a possibility that the distance measurement given by IpInfo may not be accurate as our request may not have been given to the specific server that IpInfo measured the distance to. \bigskip
+
+However, even if the possibility that the distance metric may be wrong exists there are still conclusions that can be drawn from the results presented earlier. Figures \ref{wikipedia distance} and \ref{google distance} shows us that the reply time for request made from North American clients along the east coast and in central Canada are far worse than request made from Europe and most parts of Asia. Our South American client has similarly bad reply time but this is somewhat expected as the distance has now increased significantly compared to the North American clients. But the South American client's bad reply time becomes even worse when compared to the South African client. Here there is an increase in physical distance but a decrease in reply time. This would suggest that either the South American network infrastructure is poor or that the South African network infrastructure is well constructed, or perhaps a combination of the two. As for the Asian clients, the results seem to suggest a relatively fast reply time considering the distance and with them all clumped up in a group the difference in reply time between different parts of Asia seem minimal. Europe seems to be the clear winner when looking at reply time, that is at least until the client in Paris gets involved. The client in Paris appears to have a reply time that is consistently longer than the other European clients which suggest that our results may not be conclusive, or simply that the Paris client is located in an area with a weak network infrastructure.
+\bigskip
+After performing our measurements on the Grab service, we realized that is hosted on AWS along with our clients, meaning that we were measuring the internal performance of the data centers. This is not a realistic use case as real clients are typically not hosted within the same data center making these measurements inconclusive.
+
+\section{Conclusion}
+\label{section:conclusion}
+We conclude that the reply time from a web service is a complicated thing to measure as it seems to depend on several different factors. We also conclude that distance is one of the factors that contribute to reply time and that for some services the distance is the strongest factor in determining the reply time but that this does not hold in general. So in order to increase the performance of a distributed service globally, it is important that it has servers close to its users. \bigskip
+
+We further conclude that the network infrastructure in South America, Central Canada, and The American east coast are lacking compared to the network infrastructure in parts of Europe and South Africa. These results are however not conclusive and we therefore urge further research to be conducted into this topic in order to further our understanding of how responsive different services are depending on client location. These measurements also don't take into account clients in sparsely populated areas, since AWS data centers tend to be in the region's major cities. The last mile can often be the weakest link for the client, meaning that these results may not tell the full story.
 
 %%
 %% The next two lines define the bibliography style to be used, and
@@ -143,15 +245,9 @@ The over all time span of the testing could of curse also have impact the result
 \bibliographystyle{ACM-Reference-Format}
 \bibliography{sample-base}
 
-%%\cite{MDJ98}
-%% ex of img 
-%\begin{figure}[h]
-%\caption{example image}
-%\centering
-%\includegraphics[width=0.5\textwidth]{images/to.png}
-%\end{figure}
-
-\listoffigures
+%% content info 
+%\listoffigures
+%\tableofcontents
 
 \end{document}
 \endinput
diff --git a/src/graph.py b/src/graph.py
index b4b34fb3fe9599da8ec3e77efb8ffffab8805c64..0a6d1b02f750d9766ddb173132823106dfbd9471 100644
--- a/src/graph.py
+++ b/src/graph.py
@@ -9,6 +9,7 @@ import os
 FORMAT_1 = "s-b" # square marker, solid line, blue.
 FORMAT_2 = "o-r" # circle marker, solid line, red.
 FORMAT_3 = "^:m" # triangle_up marker, dotted line, magenta.
+
 def get_json_data():
     """ iterates over the reslut dir in order
     to merge the data from different results .json and
@@ -38,6 +39,7 @@ Service: {service_name} ({dst_details['city']}, {dst_details['region']}, {dst_de
 Distance: {round(distance)} km")
         l_axes.set_xlabel("Request rate [request/s]")
         l_axes.set_ylabel("Reply time [ms]")
+        l_axes.set_ylim(0, 1000)
 
         line1 = l_axes.plot(req_rate, reply_time , FORMAT_1, label="Reply time [ms]")
 
@@ -57,12 +59,12 @@ Distance: {round(distance)} km")
 def distance_graph(service_name, client_data):
     clients = []
     distances = []
-    max_reply_times = []
+    avg_reply_times = []
 
     for client, data in client_data.items():
         clients.append(client)
         distances.append(round(data["distance"]))
-        max_reply_times.append(round(data["max_reply_time"]))
+        avg_reply_times.append(round(data["avg_reply_time"]))
 
     with plt.rc_context({"axes.autolimit_mode": "round_numbers"}):
         figure, l_axes = plt.subplots()
@@ -71,15 +73,15 @@ def distance_graph(service_name, client_data):
         l_axes.set_xlabel("Distance from server [km]")
         l_axes.set_ylabel("Peak reply time [ms]")
 
-        scatter = l_axes.plot(distances, max_reply_times, color="b", linestyle="None", marker="o")
+        scatter = l_axes.plot(distances, avg_reply_times, color="b", linestyle="None", marker="o")
 
         for i, client_name in enumerate(clients):
-            l_axes.annotate(client_name, (distances[i], max_reply_times[i]))
+            l_axes.annotate(client_name, (distances[i], avg_reply_times[i]))
 
         #trendline
-        polyfit = np.polyfit(distances, max_reply_times, 1)
-        trendline = np.poly1d(polyfit)
-        l_axes.plot(distances, trendline(distances), color="r", linestyle="solid")
+        #polyfit = np.polyfit(distances, avg_reply_times, 1)
+        #trendline = np.poly1d(polyfit)
+        #l_axes.plot(distances, trendline(distances), color="r", linestyle="solid")
 
         figure.tight_layout()
         figure.savefig("./img/" + service_name + "_distance.png")
@@ -91,21 +93,18 @@ def location_graph(service_name, client_data):
 
     clients = []
     distances = []
-    max_reply_times = []
-    hops = []
+    avg_reply_times = []
 
     for client, data in client_data.items():
         clients.append(client)
         distances.append(round(data["distance"]))
-        max_reply_times.append(round(data["max_reply_time"]))
-        hops.append(data["hops"])
+        avg_reply_times.append(round(data["avg_reply_time"]))
 
     x = np.arange(len(clients))
     width = 0.2 # Bar width
 
-    rects1 = ax.bar(x - width/2, max_reply_times, width, label='Peak reply time [ms]', color='r')
+    rects1 = ax.bar(x - width/2, avg_reply_times, width, label='Peak reply time [ms]', color='r')
     rects2 = ax.bar(x + width/2, distances, width, label='Distance to server [km]', color='g')
-    #rects3 = ax.bar(x + width, hops, width, label='Nr. of network nodes to server [#]', color='b')
     ax.set_xticks(x)
     ax.set_xticklabels(clients)
     plt.xticks(fontsize=8, rotation=30)
@@ -118,7 +117,7 @@ def location_graph(service_name, client_data):
                     xytext=(0, 3),  # 3 points vertical offset
                     textcoords="offset points",
                     ha='center', va='bottom')
-    #upper_bound = max(max(max_reply_times), max(distances))
+    #upper_bound = max(max(avg_reply_times), max(distances))
     #ax.set_yticks([p for p in range(0, round(upper_bound + (upper_bound/100)), round(upper_bound/10))])
 
     r_patch = mpatches.Patch(color='r', label=rects1.get_label())
@@ -140,15 +139,13 @@ def graph(result):
         "service1": {
             "client1": {
                 "distance: 100,
-                "max_reply_time: 50,
-                "hops": 10
+                "avg_reply_time: 50
             },
         },
         "service2": {
             "client1": {
                 "distance: 100,
-                "max_reply_time: 50,
-                "hops": 10
+                "avg_reply_time: 50
         },
     }
     """
@@ -169,7 +166,7 @@ def graph(result):
 
                 result["request rate"] = float(measure["Request rate"])
                 result["reply time"] = float(measure["Reply time [ms]"]["response"])
-                error = 100 * float(measure["Errors"]["total"]) / float(measure["Total"]["requests"])
+                error = 100 * float(measure["Errors"]["total"]) / float(measure["Total"]["connections"])
                 result["error rate"] = error
 
                 results.append(result)
@@ -188,8 +185,9 @@ def graph(result):
                 service_metrics[service_name][client_name] = {}
 
             service_metrics[service_name][client_name]["distance"] = distance
-            service_metrics[service_name][client_name]["max_reply_time"] = max(reply_time)
-            service_metrics[service_name][client_name]["hops"] = 10 # PLACEHOLDER
+            reply_time.sort()
+            reply_time_avg = sum(reply_time[-3:])/len(reply_time[-3:])
+            service_metrics[service_name][client_name]["avg_reply_time"] = reply_time_avg
 
 
     for service, data in service_metrics.items():