The Problem - When making VoIP calls (particularly with SIP) you can
ring phone numbers but once the call is answered there is either no
voice or it is only one way.
The Cause - I am pretty sure the
cause of this will be the same regardless of what protocol you are going
to use for your VoIP solution but I only have experience of SIP. So
this will definitely be an issue with SIP but I haven't confirmed it
with the other protocols.
The problem arises because VoIP uses
dynamic UDP ports for each call. This causes problems when traversing a
NAT device for two reasons; the NAT device changes the source port of
outbound packets as part of the NAT process. The second is because UDP
by its very nature is designed for one way traffic (broadcasts, video
stream etc). Where TCP traffic is bi-directional across the one
connection UDP can have 1 connection for inbound and another for
outbound meaning they can use different ports. If the inbound connection
uses different ports as the outbound connection the inbound traffic
will be dropped because the NAT device does not have a mapping for it in
its NAT table. If you are confused by now I suggest you read up on NAT
first.
What is SIP and why is it important to VoIP Just as TCP/IP
is not a protocol by itself but rather a family of protocols like TCP,
IP, PPP, PPTP, ARP etc so is VoIP. There are several protocols you can
use with VoIP each having their own pros and cons. The one we will focus
in this article though is SIP. SIP stands for Session Initiate
Protocol. It is responsible for setting up the call, ringing,
signalling, engaged tones etc.
In most SIP environments there will
be several VoIP calls in use concurrently. Every one of these calls
will be managed through the VoIP switch, each one requiring its own
voice channel. Each channel (or phone call to look at it another way)
must use a unique port. If there are 100 concurrent VoIP calls in use
there must be 100 ports available for the VoIP switch to allocate to
each call. This is where SIP comes in. It basically controls everything
that is needed in setting up the call. For each call SIP will find a
spare port, allocate it, send these details to all parties, set the call
up and ring the phones. Once the call has finished SIP terminates the
session and informs the phone switch that this port can be reassigned to
another call.
The range of ports is usually configurable, Avaya
for example allow you to configure this in the VoIP portion of the
system config. The default range for Avaya VoIP is 49152 to 53246. This
gives us a possibility of 4094 concurrent VoIP calls licensing
permitting.
In a LAN environment this is not a problem as
firewalls usually permit all traffic on all ports for all devices. Once
the internet is involved where the traffic has to traverse a NAT and
firewall we start to run into problems. In the Avaya example above it
can pick a port anywhere in the range of 49152 to 53246. You can't just
open this port range to the internet. A range of 4000 ports open isn't
very secure.
How SIP is meant to work on the internet As with all
network traffic one endpoint must initiate the connection first. This
means at least one port must be open using port forwarding to the VoIP
switch. SIP usually runs on port 5060. For the two offices to call each
other both sites must have this port being forwarded to the phone
switch. When you read documentation on SIP most of it will say that this
is all you need to do...But in all likelihood this is not the case.
The following happens when you dial a VoIP number:
- You
dial the number and your local VoIP switch matches this up with a site
ID which locates the public IP address of the remote location.
- Your local VoIP will connect to the remote IP on port 5060 using SIP (which is why the port must be open).
- The two phone switches now negotiate and set up the phone call.
Several things are done in the negotiation process but the most
important one (for this article) being the ports that they will use to
transmit the UDP voice streams.
The problem here is
that SIP doesn't know it is behind a NAT. Let's say your local switch IP
is 192.168.1.1 and the remote IP is 192.168.2.1. Although NAT modifies
the SIP packets to the public IPs when traversing the internet it does
not change the actual data in the SIP packets themselves (the payload).
It is the payload that contains the information about what ports and IP
addresses to use for the actual phone call. The local VoIP tells the
remote VoIP (via SIP) to send voice data to its local IP of 192.168.1.1
and vice versa. As we all know this is never going to work as internet
routers drop packets from and to private IP addresses. Once the call is
set up and the UDP voice data actually starts transmitting it will be
sent to private IP's and consequently dropped. So how do we fix this?
STUN
Stun stands for Session Traversal Utilities for NAT and as you may have
guessed by its name it is a collection of utilities to aid in the
traversal of a NAT devices.
STUN (as in our case) helps a program
or device learn whether it is behind a NAT and modify packets
accordingly. It requires the help of a 3rd party server on the internet
known as a STUN server. This now means that our VoIP phones can modify
their SIP content to contain the public IP instead of the private one.
Some of you may be thinking this same problem also affects ports.
It
is common with NAT to also change the source port of an outbound packet
to a new randomly generated one. When the remote device responds it
does so to this new random port. When packets come back in on this port
NAT allows it through because it mapped this port to the internal
client. As you might have guessed it this is also an issue for SIP. The
STUN server also takes this into account. The STUN client (the VoIP
switch) sends a UDP packet outbound on the port it wishes to use for the
VoIP call to the STUN server. This will be NATTED to the public IP and a
new port number. The STUN server sends this information back allowing
the VoIP switch to learn its public IP and mapped (modified) external
port for the voice traffic. Now we have all the info we require to
modify the SIP data with the correct information to traverse a NAT. The
local switch now contacts the remote switch via SIP and tells it to send
the UDP voice call to its public IP and public port. Once this data
comes back the NAT has a mapping for this in the NAT table and sends it
to the internal VoIP switch. This how I thought it should work...Have
you found what is wrong with this yet? I was stuck on this for a
while...
The reason I was stuck was not through a lack of
understanding the technologies (honest ), it was because of the stupid
documentation (from Avaya) I had on setting up SIP and my confidence in
that it was right. I checked everything again and found I had done
everything correctly then it hit me...I thought "Hold on, when the UDP
voice packets start coming in ON A RANDOM port how does it get through
the NAT device when the only port forwarding I have is 5060 for SIP???"
I
mislead you above a bit on purpose to see if you could spot it
yourself. I said there was a mapping for the incoming UDP traffic in the
NAT table but there isn't. You, like me may have assumed this because
you don't have to port forward any other ports. The only way traffic can
come into your network through a NAT without port forwarding is if it
was first requested from an outbound connection. The outbound connection
adds the entry in the NAT table to map incoming packets on this port to
the internal client. This added to my confusion. The documentation
clearly states you only need to port forward 5060 but the voice calls
use random UDP ports so how do these get past the NAT? If you are still
confused it will be because you don't understand (or have forgotten) one
fundamental difference between UDP and TCP which is very important for
us here.
TCP requires that one end point must first establish a
connection for data to be sent back. As we know you have inbound and
outbound connections. If I am making an outbound connection then it is
an inbound connection at the other end. And inbound connection requires
port forwarding which we don't have set up in this scenario. Also for
data to be sent back the socket MUST BE ESTABLISHED. This is very
important as it is not a requirement of UDP. UDP is connection-less
remember (see The Differences Between TCP and UDP for more info). It can
send data without ever being aware of the remote location. It is this
key difference between TCP and UDP that allows you to traverse a NAT
using UDP without port forwarding. The technique is called UDP hole
punching.
UDP Hole Punching Let's add all the technologies so far
to get a working solution. The two VoIP switches learn of each others
public IP and ports to be used via the STUN server. They then use SIP on
port 5060 to send this information to each other then they use UDP hole
punching for the delivery of the VoIP packets.
UDP hole punching
is a clever technique. It works by "punching" holes through the NAT
device to create the NAT mappings. The local VoIP sends UDP packets to
the remote VoIP to the port and public IP it was told to use from the
SIP data. When this data hits the NAT device at the remote location it
will not be delivered because there is no port forwarding in place and
no outbound data has been requested yet. The exact same process happens
from the remote VoIP to your local VoIP and packets are dropped as well.
The purpose of this though is not to send the packets, it is to "punch"
a hole through the NAT and create a mapping of the external port and IP
to the internet port and IP consequently allowing incoming traffic on
this port. As this happens at both ends we now have NAT mappings for
these ports to the internal clients. Because these mappings now exist
the NAT device sees these as outbound requests and will accept new
packets coming back in on the same port. So in summary the first packet
exchange will always fail from both parties but this "punches" holes
through the NAT allowing all subsequent traffic to pass through. This is
why you don't need to port forward these ports when using UDP. This
technique is exclusive to UDP because UDP doesn't guarantee or even
check as to whether the packets arrive. When the first packet fails it
doesn't matter because the sender doesn't even know it failed (as UDP
does no error checking), it just sends more UDP packets. This won't work
with TCP because it creates a socket before sending data. As the
initial packet will always fail TCP will error and keep trying to
establish a socket first before sending any data. The socket will never
connect so no data will be sent.
So Why Does The Thing Still
Fail?? OK, sorry for the long post but I am big believer that the best
way to learn is by the teacher (me, ha) leading you down the path so you
solve it yourself rather than me. This is the last bit now I promise.
If
you never knew about UDP hole punching then you would naturally think
that you need to open ports to allow the UDP traffic through. This would
explain why you get no voice at all. But what about one way traffic?
This means that the port is open at one end and not the other. How is it
possible to have UDP hole punching working at one end and not at the
other when both NAT devices are configured the same?
In all
likelihood you have different types of NAT at each site. To complicate
things more NAT isn't standardised and there are various implementations
of it. In an ideal world the documentation I read about setting up SIP
would be correct because UDP hole punching would take care of the port
forwarding of the UDP traffic. But as we often find out this is never
the case...
It gets complicated and I am not going to re-invent
the wheel. What you are looking for is what type of NAT device you have.
It is probably a symmetric NAT as this is the one that is incompatible
with STUN. Yes this is the problem!! STUN doesn't work with a symmetric
NAT, here is why.
All the other types of NATs allow traffic from
different IP's to come back into the network as long as it is on that
port regardless of where I sent the packets to. So if I connect to the
STUN to learn the external IP and port to use for VoIP this mapping now
existing. A DIFFERENT IP can send packets to me as long as they use the
same port I sent the UDP packets out on. In other words once a mapping
has been created and linked to the internal client it will accept
connections from any IP as long as it is on this port. This is not
allowed in a symmetric NAT. An outbound packet sent to a specific IP and
port will only allow packets coming back from that IP and port. So, we
do the same as above and contact the STUN server to get our public IP
and port. This info is sent to the remote VoIP via SIP. It now tries to
send data back to your local VoIP via this port but because it is a
different IP a symmetric NAT blocks it. This NAT mapping is exclusive to
the STUN server. To allow data to come in from the remote VoIP which is
a different IP a new mapping must be created, which uses a different
port... As you can see this is a problem because the port that will be
used for the actual UDP voice call is different to the one the STUN
server detected. Because the ports are dynamic and STUN won't work, your
local VoIP can never learn what that external port is to be used for
the traffic to and from the remote VoIP.
This is why you get one
way traffic in some scenarios. If both NAT devices are non symmetric
NATs they will get the correct information through STUN and voice flows
both ways ok. If one device is symmetric and the other is non symmetric
only one of them can get the correct info through STUN and data can pass
one way producing the one way audio. If both are symmetric you can't
hear anything at all because traffic can't get through either NAT
device.
So How Do I Fix It!?!? Buy a new NAT device! One that isn't a symmetric one!!
Replacing
your NAT device is one solution but the other is far more simple than
the you might think. All you need to do is the following:
- On
your phone switch (Avaya in my case) reduce the dynamic port range. How
many VoIP calls do you think you will have going at any one time max?
Most of you reading this will be 10 at a guess, maybe 20. In my case the
range was 49152 to 53246 so I reduce the max range to 49162 giving me
10 ports.
- On your NAT device set up port forwarding for the 10 ports to your VoIP switch.
The
reason this works is because you are effectively mapping your external
port numbers to the same internal port numbers (remember that NAT
replaces port numbers with random ones by itself). You now know that
your VoIP will only use a range of 10 ports and STUN will fail. This
means that the SIP information sent over to the remote VoIP will
actually list the internal ports and not the NATted ones. This means
your traffic goes out on random ports (because it is NATTED) but the
remote VoIP sends back to ports in the range you specified in your local
VoIP. There won't be a NAT mapping for this of course and it should be
blocked but this is why you use port forwarding instead. Have Fun!