Hands-on learning about network protocols using `socat`

Principle -- HTTP -- SMTP -- IMAP -- DNS

This article describes how to observe ASCII-based network protocols live. This can be helpful for troubleshooting communication and very instructive. The tool central to this is the network relay socat.

socat is a general-purpose network relay. What does that mean? It is a communications channel that connects two endpoints, which may be (low-level) network sockets, (higher-level) TCP or UDP ports, user I/O, the standard input or output of another process, or a file. And it allows to print out the communication as it happens.

The principle

Imagine you want to write a program performing a network protocol that you do not know well. For example, you could want to automate a web browsing task, fetch your e-mail without user intervention, or simply implement a communications protocol. Many network standards are built on plain text, but interactive programs do not usually allow you to see what they send to a server. However, many of them support the use of proxies, or at least can be told which server to contact.

The idea is to insert a socat process in the chain of communication with its print-as-you-transfer option enabled and observing what client and server send to each other.

View HTTP transfers

This is one of the easiest protocols to intercept, as practically all browsers support HTTP proxies. We will do this in two stages — first, we will only look at the request your browser sends. This is slightly simpler, and one does not have to deal with the bulk of the retrieved data cluttering the output.

First set up socat as a TCP server:

socat tcp-listen:1234,reuseaddr -

This tells socat to allow connections on port 1234 of the local computer where it is runnning. Ports from 1024 upward may be used for servers by ordinary users, unlike those with smaller port numbers. The reuseaddr option allows to reuse the same port again without delay after the program has finished, which is convenient if we want to try this several times. The dash at the end represents the second endpoint between which socat transfers data, namely the terminal in which it is running. So data arriving at the server port is printed to the terminal. For example, if you try to access this page using Firefox, you will see something like the following output:

GET http://www.volkerschatz.com/net/socatproc.html HTTP/1.1
Host: www.volkerschatz.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de,en;q=0.7,en-us;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive

The browser will then wait for a server to reply and eventually time out, as socat does not send anything back. The output shows what information your browser sends to the server, such as the user agent string giving details of the browser version, and your local language.

Now we also want to observe the server reply and the information contained in its header lines. To do that, we have to make socat connect to the web server in question and add the -v option to print out the data being transferred. This leads to the following command line:

socat -v tcp-listen:1234,reuseaddr tcp:volkerschatz.com:80

80 is the port number HTTP servers use. We have to give this explicitly because socat is in no way specialised for HTTP — we could just as well connect to a different protocol and port. If we use another of my web pages as an example, this gives the following output:

> 2013/02/24 16:29:37.069062  length=338 from=0 to=337
GET http://www.volkerschatz.com/net/net.html HTTP/1.1\r
Host: www.volkerschatz.com\r
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r
Accept-Language: de,en;q=0.7,en-us;q=0.3\r
Accept-Encoding: gzip, deflate\r
Connection: keep-alive\r
\r
< 2013/02/24 16:29:37.139733  length=322 from=0 to=321
HTTP/1.1 200 OK\r
Date: Sun, 24 Feb 2013 15:29:37 GMT\r
Server: Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/0.9.8r\r
Last-Modified: Sat, 01 Dec 2012 13:10:33 GMT\r
ETag: "770f6cb-6dc-4cfca3dbe8fd6"\r
Accept-Ranges: bytes\r
Content-Length: 1756\r
Keep-Alive: timeout=3, max=100\r
Connection: Keep-Alive\r
Content-Type: text/html\r
\r
< 2013/02/24 16:29:37.151512  length=1440 from=322 to=1761
...

The lines starting with inequality signs are inserted by socat to indicate the direction, time and size of the transfer. Those starting with ">" are sent by the browser to the server, those with "<" by the server. The first part is the HTTP request we have already seen, only with \r at the end — HTTP headers use CR+LF line terminators, which are automatically converted when socat connects to a terminal, but not when logging transfers with -v. The second block is the server reply containing the HTTP status code, information on the server type and version and the modification date of the page. The third part that I have omitted is the content of the online resource — in this case, the HTML page.

But have we not cheated a little? We have used the proxy support of our browser, but if we are honest, we cannot really know if a non-proxy request is not different from one that uses a proxy. To see a non-proxy request, we use the same command line as in the first try above, connecting a TCP server port to the terminal output. Then we disable the proxy in the browser and enter the URL http://127.0.0.1:1234/index.html (or any other document path, as none of them really exist). This causes the browser to contact the socat process as a web server rather than a proxy. The ":1234" specifies the non-standard port number to use instead of 80. The result is:

GET /index.html HTTP/1.1
Host: 127.0.0.1:1234
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de,en;q=0.7,en-us;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive

As one can see, the online resource after the GET keyword does not include a server name and http: protocol specifier.

Observe sending mail via SMTP

Sending mail is the "less dangerous" direction of mail protocols — you can choose to send only test mails and will not risk losing any mail you want to receive. So this is the direction we will start with. As SMTP is based in TCP, we set up a TCP server with socat as before:

socat -v tcp-listen:1234,reuseaddr tcp:your-smtp-server.foo:25

Here, "your-smtp-server.foo" is you regular SMTP server, and 25 is the port used for SMTP. Now you have to tell your mail client to use the local computer to send mail. For the text-based mailer pine (and its successor alpine), this is the smtp-server variable in the .pinerc configuration file. Set it to:

smtp-server=127.0.0.1:1234

Then try sending a short test email. The output of socat on my system is:

< 2013/03/09 18:09:24.050035  length=57 from=0 to=56
220 your-smtp-server.foo ESMTP RZmta 31.19 ready\r
> 2013/03/09 18:09:24.050430  length=25 from=0 to=24
EHLO schizo.localdomain\r
< 2013/03/09 18:09:24.093368  length=219 from=57 to=275
250-your-smtp-server.foo greets 77.2.8.254\r
250-ENHANCEDSTATUSCODES\r
250-8BITMIME\r
250-PIPELINING\r
250-DELIVERBY\r
250-SIZE 104857600\r
250-AUTH SCRAM-SHA-1 DIGEST-MD5 CRAM-MD5 LOGIN PLAIN\r
250-STARTTLS\r
250 HELP\r
> 2013/03/09 18:09:24.093928  length=10 from=25 to=34
STARTTLS\r
< 2013/03/09 18:09:24.136984  length=24 from=276 to=299
220 Ready to start TLS\r
> 2013/03/09 18:09:24.137291  length=228 from=35 to=262
...

Differently from HTTP, here the server starts the communication. The client replies with a EHLO instead of HELO, querying the server's capabilities. (Yes, my computer really has that host name ;D) The mail client then starts an encrypted connection, and the remainder of the communication, indicated by the ellipsis, is unreadable. Thankfully socat replaces unprintable characters by dots. If you really want to view binary data, you can replace socat's -v option with -x. Then everything except the transfer direction/time/size lines produced by socat is in hexadecimal.

Accessing a mailbox with IMAP

The other direction — retrieving mail — is the more dangerous operation, as your mail may be lost if you delete it from the server without properly saving it locally. To be cautious, I will be using my mail backup script in "list" mode for the following experiments. This script never deletes anything, and listing mail on the server is not a dangerous operation. If your mail client can be made to do the same thing (just list mail, no retrieval), you are probably safe trying this out.

As before, we set up our socat relay between the mail (IMAP) server and a local port:

socat -v tcp-listen:1234,reuseaddr tcp:imap.yourprovider.com:143

Then the IMAP server of the mail client used has to be set to the local host (127.0.0.1) and the port 1234. In the case of mailbackup, this is comparatively easy: The options PeerAddr => "127.0.0.1", IMAPPort => 1234, ConnectMethod => "PLAIN" have to be added in the connect method of the IMAP::Client object. The last option disables encryption so we can read the communication.

In my example, the output is:

< 2013/03/17 19:52:05.355560  length=208 from=0 to=207
* OK [CAPABILITY IMAP4 IMAP4rev1 AUTH=SCRAM-SHA-1 AUTH=DIGEST-MD5 AUTH=CRAM-MD5 AUTH=LOGIN AUTH=PLAIN ID IDLE SORT QUOTA NAMESPACE UNSELECT UIDPLUS MULTIAPPEND CHILDREN WITHIN XLIST] IMAP server ready (117)\r
> 2013/03/17 19:52:05.356090  length=11 from=0 to=10
0004 NOOP\r
< 2013/03/17 19:52:05.408387  length=24 from=208 to=231
0004 OK NOOP completed\r
> 2013/03/17 19:52:05.408595  length=46 from=11 to=56
0005 LOGIN mailbox@yourprovider.com password\r
< 2013/03/17 19:52:05.459986  length=30 from=232 to=261
0005 OK User logged in (344)\r
> 2013/03/17 19:52:05.460176  length=22 from=57 to=78
0006 EXAMINE "INBOX"\r
< 2013/03/17 19:52:05.525841  length=278 from=262 to=539
* FLAGS (\\Answered \\Flagged \\Deleted \\Seen \\Draft \\Forwarded)\r
* 181 EXISTS\r
* 181 RECENT\r
* OK [UNSEEN 1]\r
* OK [PERMANENTFLAGS (\\* \\Answered \\Flagged \\Deleted \\Seen \\Draft \\Forwarded)]\r
* OK [UIDNEXT 778]\r
* OK [UIDVALIDITY 1199367148]\r
0006 OK [READ-ONLY] EXAMINE completed\r
> 2013/03/17 19:52:05.526531  length=39 from=79 to=117
0007 FETCH 1:* (ENVELOPE RFC822.SIZE)\r
< 2013/03/17 19:52:05.581956  length=20 from=540 to=559
* 1 FETCH (ENVELOPE < 2013/03/17 19:52:05.590854  length=315 from=560 to=874
...
0007 OK FETCH complete\r
> 2013/03/17 19:52:06.664428  length=13 from=118 to=130
0008 LOGOUT\r
< 2013/03/17 19:52:06.715562  length=40 from=58685 to=58724
* BYE Logout\r
0008 OK LOGOUT completed\r

As for SMTP, the server initiates the communication. On line 0005, the authentication data are transmitted in plain because we disabled encryption (obviuosly this is not a good idea normally). The next command sent by my script is a query of the contents of the mailbox (EXAMINE), followed by retrieval of the mail envelopes (metadata; FETCH command). This returns 181 FETCH replies, each followed by the mail metadata in a space-separated quoted string list grouped by parentheses. I have left out all the actual envelopes and all but the first FETCH reply line. The client program then logs out and closes the connection.

A UDP example: name server lookup

Just to give you an example that is not also based on TCP, let's look at name server lookups. They are (usually) based on UDP. As we will see, this is a binary protocol. This time we have to use socat to set up a UDP relay:

socat -x udp-listen:1234,reuseaddr udp:000.000.000.000:53

Note the -x option to output a hex dump. Replace 000.000.000.000 by the numerical IP address of your name server from /etc/resolv.conf. In order to perform the lookup, one can use the dig program. (The host program does not allow to specify a port to contact.) The following command line makes it connect to the relay we have set up:

dig @127.0.0.1 -p 1234 volkerschatz.com

The IP after the @ is the name server to contact, and the number after -p is the port; in this case they refer to the local port of the socat relay. The hex dump of the communication is:

> 2013/03/17 20:23:19.001936  length=45 from=0 to=44
 49 66 01 20 00 01 00 00 00 00 00 01 0c 76 6f 6c 6b 65 72 73 63 68 61 74 7a 03 63 6f 6d 00 00 01 00 01 00 00 29 10 00 00 00 00 00 00 00
< 2013/03/17 20:23:19.046846  length=61 from=0 to=60
 49 66 81 80 00 01 00 01 00 00 00 01 0c 76 6f 6c 6b 65 72 73 63 68 61 74 7a 03 63 6f 6d 00 00 01 00 01 c0 0c 00 01 00 01 00 00 1c 10 00 04 51 a9 91 46 00 00 29 0f a0 00 00 00 00 00 00

Pretty much the only obvious thing one can discern is the host name volkerschatz.com starting with 76 6f 6c 6b ... (use man ascii to display the ASCII code manual page). Interestingly, the com top-level domain is separated by a byte with value 03, not a dot.