I needed to set up a test environment for a client server application that emulates different network speeds and latency. Running a Google query for “wan emulation” or “network emulation” returns many different results, most of them paid apps and boxes. Then I remembered a podcast I listened to recently about pfSense. I downloaded the nightly version 2.0 beta and installed it on an old box we had laying around with 2 network cards. The installation process was very straightforward. After installing, I had to tell pfSense which network interface is the wan side – in my case connected to the GraphTech internal lan, and which is the lan – in my case connected to the server I was testing. Now it is all a matter of setting up the system using the simple web interface from the server machine on the lan side. I set up a 1:1 nat between the pfSense and the server I am testing and a firewall rule to map all traffic to the server. I also set up a few limiters - the queues that fakes different network speeds and added them to the firewall rule as In/Out rules. All I need to do now in order to test different upload and download speeds, latency and even unordered packets is to change the In/Out queues in the firewall rule!
Posts Tagged ‘Networking’
Simple network emulation
Sunday, May 23rd, 2010C++ MicroSleep(int microsecs) from userland Windows/UNIX
Monday, January 11th, 2010
Recently I was investigating various strategies to measure point-to-point bandwidth on the Internet. Most of the recent methods use the packet-pair technique. They involve sending multiple packets to the destination with a precise time spacing between the packets and then inferring network properties based on the change in time spacing at the receiving end. In order to get reliable results and especially given the speed of the Internet today, the sending packet spacing needs to be in the order of microsecs.
The ideal platform for making these measurements would be via a kernel driver or a real-time OS. However, I needed them from userland on both Windows and Linux. How can we achieve a microsec sleep on these platforms? Windows only provides a millisec-resolution sleep via Sleep() and the UNIX function usleep() doesn’t garanttee precision, nor even a resolution better than a millisec. Both Windows and UNIX can also provide microsec sleeping via the select() function, but measurements showed that the resolution was no better.
Below is my implementation of a quasi-precise microsec sleep on both Windows and UNIX. The idea is to obtain the start time in microsecs and then sit in a tight loop waiting for the timer to reach the required sleep period. The problem with this approach is that it consumes CPU during the entire wait. To alleviate this, we can measure the time resolution of the system sleep call and the system scheduler yield call. Then, we can use these functions in a controlled manner in order to perform a nice sleep for most of the desired period, reserving the tight wait loop for the last part.
I observed that the scheduler yield call (on Linux this is sched_yield() and on Windows this is Sleep(0)) typically takes a few microseconds, so it is possible to use this call during the tight wait loop if precision down to this level is not required or if you want to avoid aggresive use of the CPU. The function below accepts a boolean (aggressive) to control whether or not to do this.
#ifdef _WINDOWS
#include <windows.h>
#else
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
#include <sched.h>
#include <math.h>
#endif
#ifdef _WINDOWS
#define MICROTIME_SAVE(t) QueryPerformanceCounter( (LARGE_INTEGER*)&t )
#define MICROTIME_DIFF(t1, t2, freq) (int)((t2.QuadPart >= t1.QuadPart ? \
t2.QuadPart - t1.QuadPart : \
0xffffffffULL - t1.QuadPart + 1 + t2.QuadPart) * 1000000ULL / freq.QuadPart)
#define USLEEP(t) Sleep((int)((t > 1000) ? t / 1000 : 1))
#define SCHED_YIELD() Sleep(0)
#define MICROTIME_ADD(t, i, freq) \
(t.QuadPart += ((unsigned long long)i * freq.QuadPart / 1000000ULL))
#define MICROTIME_SUB(t, i, freq) \
(t.QuadPart -= ((unsigned long long)i * freq.QuadPart / 1000000ULL))
#define MICROTIME_REACHED(curr, exp) ((curr.QuadPart < exp.QuadPart) ? \
(exp.QuadPart - curr.QuadPart > 0x80000000) :\
(curr.QuadPart - exp.QuadPart < 0x80000000))
#else
#define MICROTIME_SAVE(t) gettimeofday(&t, NULL)
#define MICROTIME_DIFF(t1, t2, freq) (int)((t2.tv_sec - t1.tv_sec) * 1000000 + \
(t2.tv_usec - t1.tv_usec))
#define USLEEP(t) usleep(t)
#define SCHED_YIELD() sched_yield()
#define MICROTIME_ADD(t, i, freq) { t.tv_usec += i; if(t.tv_usec >= 1000000) \
{ t.tv_sec++; t.tv_usec -= 1000000; } }
#define MICROTIME_SUB(t, i, freq) { if(i > t.tv_usec) \
{ t.tv_sec--; t.tv_usec = 1000000 - (i - t.tv_usec); } \
else { t.tv_usec -= i; } }
#define MICROTIME_REACHED(curr, exp) (curr.tv_sec > exp.tv_sec || \
(curr.tv_sec == exp.tv_sec && curr.tv_usec > exp.tv_usec))
#endif
int MicroSleep(int microsecs, bool aggressive)
{
static int minSleep = 1;
static int minYield = 1;
static bool bNotInit = true;
int interval, diff;
#ifdef _WINDOWS
static ULARGE_INTEGER freq;
ULARGE_INTEGER tv1, tv2, exp, exp2, curr;
#else
static int freq;
struct timeval tv1, tv2, exp, exp2, curr;
#endif
if(bNotInit)
{
#ifdef _WINDOWS
// Get clock frequence
QueryPerformanceFrequency( (LARGE_INTEGER*)&freq );
#else
freq = 1;
#endif
// Find out the resolution of usleep().
for(int i = 0; i minSleep)
{
minSleep = diff;
}
}
minSleep *= 2;
// Find out the resultion of sched_yield()
for(int i = 0; i minYield)
{
minYield = interval;
}
}
minYield *= 2;
bNotInit = false;
}
MICROTIME_SAVE(tv1);
exp = tv1;
MICROTIME_ADD(exp, microsecs, freq);
interval = microsecs - minSleep - 2 * minYield;
if(interval > 0)
{
USLEEP(interval);
}
if(!aggressive && (microsecs > minYield))
{
// We can only use sched_yield() until we reach within minYield of the
// dead-line. Work out the cutoff for its use.
exp2 = exp;
MICROTIME_SUB(exp2, 4 * minYield, freq);
while(1)
{
MICROTIME_SAVE(curr);
if(MICROTIME_REACHED(curr, exp2))
{
break;
}
SCHED_YIELD();
}
}
while(1)
{
MICROTIME_SAVE(curr);
if(MICROTIME_REACHED(curr, exp))
{
break;
}
}
MICROTIME_SAVE(tv2);
return MICROTIME_DIFF(tv1, tv2, freq);
}
Testing shows that this code produces precise results with occasional lack of precision of the order of 10 – 50 microsecs. These outliers are the result of context switching. This can be reduced by running the thread at a higher priority level. In a later blog, I will demonstrate how to create a worker thread that runs at a high priority and provides a precise microsec frequency signal that other threads can listen to via a semaphore.
Expert Reading
Thursday, January 17th, 2008GraphTech Experts have selected a number of articles that they have found interesting and relevant. These are listed below, by topic area:
