Man In The Middle
I got pulled into a problem at work yesterday. The issue is that some people (randomly) begin experiencing one-way audio on a phone call that moments before had been working fine. Packet captures revealed that when the problem occurs the audio stream makes it all the way to the IP phone, but fails to be played on the user’s handset. A colleague had analyzed the data and found an anomaly around the time the audio stopped working; the RTP timestamp in the packet jumped significantly. The working theory was that maybe the phone was confused by the time jump which caused it to quit playing the audio. It felt like a plausible theory, but it also felt like it could be a red herring.
The obvious thing would be to reproduce in a controlled environment to test the theory. At first we didn’t think there was a good way to do this, but then I had an idea. The solution, insert something in the call path that could re-write RTP packets while still in transit to their destination. If the packets could be manipulated in real-time then we could trigger the timestamp jump at will during a phone call.
The rest of this entry details how that was accomplished. Read on if you are interested.
There are several ways to insert a device into the packet path. I happen to have a capable Cisco router front ending my network so I chose to use policy based routing to accomplish this part of the task.
To enable policy routing all that is needed is an access-list and a route-map. The access-list consists of a single entry
access-list 120 permit udp any host 192.168.1.30
The route-map isn’t much more complex
route-map pbr permit 10
match ip address 120
set ip next-hop 192.168.1.22
The logic is this: match any packets that are type UDP and are destined for the IP address of the phone (192.168.1.30), and change the next hop to be the Linux box. To enable the policy it needs to be applied to the ingress interface on the router with the command:
ip policy route-map pbr
By using tcpdump on the Linux box I was able to confirm that packets destined for the phone were now being redirected to the Linux box. This also had the side effect of breaking audio in one direction since the RTP packets were being sent to the Linux box and not the phone.
The next step was to write a program that could listen for and decode the packets. I used PERL and Net::Pcap for this. Net::Pcap provides a nice interface to libpcap, which is the same backend that tcpdump and Wireshark use. Once the information was decoded, the final step was to create a RAW socket. RAW sockets allow for hand crafted packets to be placed on the wire, and is the mechanism that allowed the manipulated packets to be sent on to their true destination. Below is the code that accomplishes this:
[code lang=”perl”]
#!/usr/bin/perl
use strict;
use Socket;
use Net::Pcap ();
use POSIX ();
$| = 1;
our $cfg = {
‘dev’ => ‘eth1’,
‘seen’ => 0,
‘count’ => 0,
‘ether_hdr_len’ => 14,
‘udp_hdr_len’ => 8,
‘ip_source_loc’ => 12,
};
#——————————————————————————
# Open device for live capture
#——————————————————————————
my $err = ”;
my $pcap = Net::Pcap::open_live($cfg->{‘dev’}, 1546, 1, 0, \$err) ||
die "Can’t open $cfg->{‘dev’} for sniffing ($err)\n";
#——————————————————————————
#——————————————————————————
# Create a filter to ensure we are only seeing packets of interest
#——————————————————————————
my $filt = "ether src 00:0b:be:59:92:20 && src host 10.10.10.10 && ".
"dst host 192.168.1.30";
my $cfilt;
if (Net::Pcap::compile($pcap, \$cfilt, $filt, 1, 0) == -1) {
die "Unable to compile filter string $filt\n";
}
Net::Pcap::setfilter($pcap, $cfilt);
#——————————————————————————
#——————————————————————————
# Start the capture. The subroutine process_packet will be called for each
# packet that matches the filter.
#——————————————————————————
my $rv = Net::Pcap::loop($pcap, -1, \&process_packet, undef);
die "Net::Pcap::loop stopped due to an unexpected error\n" if ($rv == -1);
#——————————————————————————
sub process_packet {
our $cfg;
my $user_data = shift;
my $meta_data = shift;
my $packet = shift;
#—————————————————————————-
# Get IP header length. IP headers can be variable in length though they
# are typically 20 bytes. The first byte in the header contains the version
# and length.
#—————————————————————————-
my $ver_len_fld = ord substr($packet,$cfg->{‘ether_hdr_len’},1);
my $ip_hdr_len = (($ver_len_fld & 0xF) * 32) / 8;
#—————————————————————————-
# The RAW socket needs to be created, but only once. The first time through
# parse the packet and pull out interesting bits. The only thing that is
# really needed is the destination IP address.
#—————————————————————————-
if (!exists $cfg->{‘sock’}) {
#————————————————————————–
# Get source/destination IPs and ports
#————————————————————————–
print "Source loc: $cfg->{‘ether_hdr_len’}+$cfg->{‘ip_source_loc’}\n";
my $source_int = hex unpack(
"H*",substr($packet,$cfg->{‘ether_hdr_len’}+$cfg->{‘ip_source_loc’},4)
);
my $dest_int = hex unpack(
"H*",substr($packet,$cfg->{‘ether_hdr_len’}+$cfg->{‘ip_source_loc’}+4,4)
);
my $source_port = hex unpack(
"H*",substr($packet,($cfg->{‘ether_hdr_len’}+$ip_hdr_len),2)
);
my $dest_port = hex unpack(
"H*",substr($packet,($cfg->{‘ether_hdr_len’}+$ip_hdr_len+2),2)
);
#————————————————————————–
# In the raw packet the source and destination IPs are in long integer
# format. Convert that to dotted decimal.
#————————————————————————–
my $source_ip = int_to_ip($source_int);
my $dest_ip = int_to_ip($dest_int);
#————————————————————————–
#————————————————————————–
# Get the RTP timestamp
#————————————————————————–
my $timestamp = hex unpack(
"H*",
substr(
$packet,($cfg->{‘ether_hdr_len’}+$ip_hdr_len+$cfg->{‘udp_hdr_len’}+4),4
)
);
#————————————————————————–
print "Src: $source_ip:$source_port / Dst: $dest_ip:$dest_port / ",
"TS: $timestamp\n";
#————————————————————————–
# Create the RAW socket. This will allow us to take the captured packet
# and put it back on the wire.
#————————————————————————–
socket($cfg->{‘sock’}, AF_INET, SOCK_RAW, 255) ||
die "Socket creation failed ($!)\n";
setsockopt($cfg->{‘sock’}, 0, 1, 1);
#————————————————————————–
# For this type of socket you have to specify the destination address, so
# pack and store that for later.
#————————————————————————–
$cfg->{‘dest’} = pack(‘Sna4x8’, AF_INET, $dest_port, $dest_ip);
}
#—————————————————————————-
# Keep count of the packets we’ve seen. We’ll use this to decide when to
# jump the timestamp.
#
# Print the value so we know how much longer until the jump
#—————————————————————————-
$cfg->{‘count’}++;
print "Count: $cfg->{‘count’}\r";
#—————————————————————————-
#—————————————————————————-
# After 20,000 packets are seen, modify the RTP timestamp to see if the
# issue can be reproduced.
#—————————————————————————-
if ($cfg->{‘count’} >= 20000) {
#————————————————————————–
# Get the timestamp of the current packet
#————————————————————————–
my $timestamp = hex unpack(
"H*",
substr(
$packet,($cfg->{‘ether_hdr_len’}+$ip_hdr_len+$cfg->{‘udp_hdr_len’}+4),4
)
);
my $ts = $timestamp + 3000000000;
#————————————————————————–
# Update the RTP timestamp field in the packet with the new value.
#————————————————————————–
substr(
$packet,
($cfg->{‘ether_hdr_len’}+$ip_hdr_len+$cfg->{‘udp_hdr_len’}+4),
4,
pack("N",$ts)
);
#————————————————————————–
# Alert that the jump has occurred.
#————————————————————————–
if (!$cfg->{‘seen’}) {
print "TS: $timestamp – New TS: $ts — Epoch: ",time(),"\n\n";
$cfg->{‘seen’} = 1;
}
}
#—————————————————————————-
# Strip the original Ethernet header off of the packet. This machine will
# apply its own Ethernet header.
#—————————————————————————-
my $p = substr($packet,$cfg->{‘ether_hdr_len’});
#—————————————————————————-
#—————————————————————————-
# Send the manipulated packet on to its destination
#—————————————————————————-
send($cfg->{‘sock’}, $p, 0, $cfg->{‘dest’});
#—————————————————————————-
#print_dump($packet);
}
#——————————————————————————
# Convert and integer based IP to dotted decimal notation
#——————————————————————————
sub int_to_ip {
my $int = shift;
return join(".",unpack(‘C4’,pack(‘N’,$int)));
}
#——————————————————————————
# This routine does a side-by-side hex and ascii dump of the packet contents.
# It is only used for troubleshooting. Courtesy of John Jetmore
#——————————————————————————
sub print_dump {
my $packet = shift;
my $n = length($packet);
print "Packet length: $n\n";
my $c = 0; # counter
my $i = 16; # increment value
my $s = ”; # the subject string
while ($packet && ($packet =~ s|^(.{1,$i})||smi)) {
$s = $1;
my @c = map { ord($_); } (split(”, $s));
$s =~ s|[^\x21-\x7E]|.|g;
my $hfs = ”; my $hc = 0;
for (my $hc = 0; $hc < $i; $hc++) {
$hfs .= ‘ ‘ if (!($hc%4));
if ($hc < scalar(@c)) {
$hfs .= ‘%02X ‘;
} else {
$hfs .= ‘ ‘;
}
}
printf("%04d:$hfs %-16s\n", $c, @c, $s);
$c += $i;
}
print "\n";
}
[/code]
In the end, causing the RTP timestamp to jump did not have any impact on the call audio. At least now we know to move on and look for other causes.
2 responses to “Man In The Middle”
I can’t believe I’m sending you this on your blog…
So, a couple of things that were observed after your experiment:
1) It only seems to happen when the person who loses inbound audio is speaking
2) When it happens on calls that lose audio (because we know it can happen on calls that don’t lose audio), every byte of the RTP payload in the first “jumped” packet consists of 0xFF.
3) There’s a reference on voip-info to using the Timestamp field of RTP to do silence suppression (http://www.voip-info.org/wiki/view/RTP+Silence+Suppression). We don’t have the marker bit set when it occurs, but otherwise doesn’t that sound very similar?
I’m still kind of leaning toward this being a red herring because otherwise it’s getting into too much of a “maybe if you close your left eye and cross your fingers” type failure, but I also don’t think we proved it wasn’t the issue.
And, in case I never said it, much props for recreating this in the lab, very cool…
Thanks for making me think about this on a nice quiet friday night surfing some blogs =).
nerd
I can’t believe I’m sending you this on your blog…
So, a couple of things that were observed after your experiment:
1) It only seems to happen when the person who loses inbound audio is speaking
2) When it happens on calls that lose audio (because we know it can happen on calls that don’t lose audio), every byte of the RTP payload in the first “jumped” packet consists of 0xFF.
3) There’s a reference on voip-info to using the Timestamp field of RTP to do silence suppression (http://www.voip-info.org/wiki/view/RTP+Silence+Suppression). We don’t have the marker bit set when it occurs, but otherwise doesn’t that sound very similar?
I’m still kind of leaning toward this being a red herring because otherwise it’s getting into too much of a “maybe if you close your left eye and cross your fingers” type failure, but I also don’t think we proved it wasn’t the issue.
And, in case I never said it, much props for recreating this in the lab, very cool…
Thanks for making me think about this on a nice quiet friday night surfing some blogs =).
nerd