Are encrypted Skype calls secure enough?

Is Skype really very secure? We know that it uses 256 bit AES encryption to encrypt communication between users. But does that mean that Skype is not capable of eavesdropping calls or chat messages? This web site suggests that they provide some kind of text filtering on Skype communication in China.

Also, it is no secret that after being bought by eBay, they provided user information to the US government, before Microsoft acquired Skype. But today I am going to talk about whether we can extract any information from an encrypted Skype communication only from the sent/received internet packets, without actually attempting to do any decryption.

Setup

In this experiment, Skype version 4.0.0.224 was used on a Windows XP machine (caller side) to make a voice call to another Skype client (version “2.2 Beta for Linux”) running on an Ubuntu machine (receiver side). The machines are co-located and belong to the same network. The internet packets were captured by using Wireshark on both sides, which is a popular open-source network protocol analysis software that is available both for Linux and Windows.

On the caller side, an audio file of length 158 seconds was played directly to the audio input device, that is, no microphones or recording equipment were used. The audio file (8000 Hz, mono, 16 bit, PCM wav file) is a phone conversation between one female and one male American English speakers.

Data analyses

Captured packets were filtered to eliminate all incoming and outgoing non-Skype packets: All non-UDP (TCP, ARP, etc.) packets were ignored and only packets sent from the caller to the target Skype port were considered.

Packet time stamps and packet sizes were parsed from the filtered logs, on both parties. A comparison between the two log files revealed that the packet sizes were consistent on both parties but there were differences in time sent / time arrived values.

During Skype calls, on average, a packet was sent every 0.02 seconds. The average packet size was calculated to be 135.7 bytes, 42 bytes of which is a fixed header section, while the remaining part is Skype’s encrypted data.

I repeated the test several times. While packet sizes were consistent on the caller and receiver sides within each attempt, I observed some discrepancies in packet sizes when making a new Skype call despite using the same audio file and setup. The below table contains two sets of packet sizes corresponding to the same audio segment, taken from two separate Skype calls in which the same file was sent and received. In order to get this table, I had to do some shifting of the data to find the exact place where the two data sets correspond to each other.

140

137

139

136

136

140

129

123

132

119

124

97

96

91

93

94

98

104

107

99

107

120

121

120

121

135

Table 1: UDP packet sizes comparison for the same audio region in two separate Skype calls.

As I mentioned above, packets are not sent in regular intervals, so using the packet size information alone will create a quite noisy representation of the actual audio waveform (data not shown). Instead, if we attempt to approximately match the audio waveform to the sent packages by

di = si / ti-ti-1 (Eq 1)

where si is the size of the ith packet, and ti is the time stamp for the packet, we obtain the below plot, which is the corrected plot showing the information content sent in a unit time:

Figure 1: The bottom panel shows the actual audio waveform, and the top panel shows the di values obtained from equation 1.

As it can be seen in Figure 1, the plotted di values roughly follow the general waveform pattern. Although this representation is better than using packet sizes along a homogeneous time axis (where time intervals are fixed), it is still very noisy.

Figure 2 demonstrates the nature of variation in time stamps of consecutive packets. The y-axis represents the time intervals between successive packets, while the x-axis corresponds to packets numbers. An interesting observation one can make from this plot is the fact that while the time intervals differ, they appear to bear only a fixed number of preferred values, instead of some random values from a continuous time range. That is, the delay can be 0.015 or 0.02 or 0.025 seconds and so on.

Figure 2: Time differences (y-axis) in seconds between consecutive packets (x-axis) sent, for a small audio segment.

Next, I created “time interval, packet size” tuples for two Sykpe calls in which I sent the same audio file. Based on an analysis done on the entire audio, I allowed time difference values to be any of the pre-determined 8 different time intervals (Figure 2 shows roughly 6 conserved values, but I think there were 8, looking at a wider range of data). The question: for the two calls, will these delay/size patterns be the same? I computed the total Euclidean distance between the windows of length 200 tuples for the two calls to find the best matching tuple positions in the two calls. The best matching positions were not sequential, i.e., there was no consistency between the preferred time intervals over the same regions in the two calls. That is, the observed packet time intervals are not consistent across multiple calls (for the same audio) and thus, they are independent of the data. But as we saw in Table 1, there is a general similarity, albeit a bit rough, between the packet sizes of the same audio segments over two separate Skype calls.

In the next step, I used a longer time window to compute the d values. Even if there are variations in time periods between receiving any two consecutive packets, fluctuations in time intervals should be negligible when considering packets sent in a sufficiently large time span. Instead of working with only two neighboring packets, this time I used groups of 20 (denoted by k below)  packets to work out the corrected d values (Eq 2), which game me the plot in Figure 3b.

di = si+k / ti+k-ti-k  (Eq 2)

Figure 3: a) The bottom pane shows the original waveform, b) the middle pane shows the calculated d values over k=20 from a Skype call, c) the top pane is the power plot of the original audio waveform.

Notice how Equation 2 echoes the power plot of the original audio (Figure 3)! It seems, even if Skype communication is encrypted and secure, it can be still possible, for example, to search for certain phrases by comparing energy patterns of a phrase and the mapped version of packet size data using this equation. Another application of using packet data information could be speaker diarisation. Encrypted audio data size and audio speaker compression rates are closely related. If two speakers feature distinct voice characteristics, the corresponding Skype data packet sizes could hint about who is speaking when, by following the explained methodology.

Linux tips – II

Suppose that you want to write a sequence of consecutive numbers to the terminal. There is an alternative to writing a one-liner program in Perl, AWK, bash script or in your favorite scripting language: using the standard Linux command seq. But before that let’s see with an example how we can generate sequential numbers using scripting languages. A typical Bash way for writing numbers from 3 to 7 (inclusive) will be:
for ((i=3; i<8; i++)); do echo $i; done
Although the overall syntax will be slightly uglier, the same result can be achieved via Perl:
perl -e 'for ($i=3;$i<=7;$i++) {print "$i\n";}'
However, Linux offers a very elegant way of doing this, with the use of “seq”:
seq 3 7 

will print these numbers on separate lines: 3, 4, 5, 6, 7.

If you want to print in increments of 2, for instance:

seq 3 2 7

will produce the numbers 3, 5, 7