Tuesday, March 16, 2010

Background of Email

The most common mistakes I hear or read about email usually begin with people talking about POP3 and IMAP, so let’s clear those off the table: POP3 and IMAP have absolutely nothing to do with sending or receiving email,  so get that notion out of your head (if it’s in there). Email is sent and received using SMTP, which stands for Simple Mail Transfer Protocol.
It’s a good name, because getting email from one place to another really IS simple, and it works a lot like sending a snail mail letter in real life:
Step 1: You open up your email program, and hit the button to compose a new email, and you write up the latest gossip about your new job.
In real life, this is the same thing as sitting down at a desk, getting out a piece of paper, and handwriting the same letter.
Step 2: You want to send this to your mother, so you put in her e-mail address, mary@ToVille.com, in the “To” field and hit Send. Your email program automatically fills in your “From” address, which happens to be “gr8gonzo@FromTown.com”.
In real life, you put the letter into an envelope, write her full mailing address on the outside, and write in your address in the “From” section.
Step 3: After you hit the Send button, you might wait for a few seconds while it says it is Sending. Behind the scenes, your email program is talking in code to your mail server and handing it your letter.
In real life, this is the same as taking the envelope to the nearest post office and dropping it off in their outgoing mail slot.
Step 4: Now your mail server takes the email message, and looks at the “To” address, which happens to bemary@ToVille.com, and separates the address into the recipient’s name (“mary”) and a domain (“ToVille.com”). Now, your mail server says, “Well, -I- only handle email addresses that end in @FromTown.com, so I need to send this message over to a mail server that handles email addresses that end in @ToVille.com.”
In real life, your post office looks at the city, state, and zip code on the “From” address on the envelope, which happens to be FromTown, VA 22033. The envelope should go to your mother, who lives in ToVille, TX. The post office in Virginia can’t just send a postal truck from Virginia all the way to Texas to deliver the letter straight to your mother, so they say, “We have to send this envelope over to a post office in ToVille, TX. That post office will know the physical location of Mary’s mailbox.”
Step 5: Your email server performs a special lookup that returns a list of servers like mail1.ToVille.com and mail2.ToVille.com that handle emails that end in @ToVille.com. (Side note: this special lookup is a DNS lookup, and the mail server records are called MX records.) It then tries to connect to each one until it gets through, and once it connects, it sends your original message to that server (with one minor addition – it updates some of the hidden header information in your message to record the fact that it handled the message).
In real life, the FromTown, VA post office puts a stamp on the letter that just lets the recipient know that they received the letter on a certain date. The post office has a big truck that carries mail between different US states / post offices, so the envelope gets put onto that truck, and it makes its way over to the ToVille, TX post office.
Step 6: Now your email is on the server (we’ll say that it’s mail2.ToVille.com), which handles that “To” email address, so it says, “Okay – I AM the right server to be receiving this email, so I’m just going to save it on my hard drive here. My job is done.” Later, your mother will check her email and her email program will download the new message.
In real life, the post office in ToVille, TX gets the envelope and says, “Yup, we’re the right post office for this letter. Mary happens to have a mailbox right here, so let’s just put the envelope into her mailbox, and our job is done.” Later, Mary will come to the post office and open her mailbox and see the letter.
In 6 simple steps, you’ve gotten an email and a letter from one place to another. Now let’s look at some different scenarios:
1. Multiple Mail Servers
Sometimes more than two mail servers will be involved. For example, when I’m hired to set up a mail system, I usually have two servers. The first one initially receives the mail and does some spam filtering and virus-checking, and then sends the message to the second mail server that actually contains the person’s mailbox and is responsible for storing the message.
2. Reading Mail Headers (Received)
By looking at a message’s header, you can see how many servers were involved in the delivery process. Here’s an example:
Received: from aserver.experts-exchange.com (xx.xxx.xxx.164)
by myserver.com with SMTP; 19 Oct 2009 15:01:14 -0000
Received: from aserver.experts-exchange.com (localhost [127.0.0.1])
by aserver.experts-exchange.com (8.14.3/8.14.3) with ESMTP id n9JF1DmL003430
for <me@myserver.com>; Mon, 19 Oct 2009 08:01:13 -0700 (PDT)
(envelope-from noreply@experts-exchange.com)
That might look complicated, but it’s really not. The order is in from most newest event to oldest, so if we flip the order, we start with:
Received: from aserver.experts-exchange.com (localhost [127.0.0.1])
by aserver.experts-exchange.com (8.14.3/8.14.3) with ESMTP id n9JF1DmL003430
for <me@myserver.com>; Mon, 19 Oct 2009 08:01:13 -0700 (PDT)
(envelope-from noreply@experts-exchange.com)
So on October 19th, 2009 at approximately 8 AM, someone on the server called aserver.experts-exchange.com created an email that was going to be sent to me@myserver.com, and they said the “from” address was noreply@experts-exchange.com. The email was created on the same server that was going to send the mail, so we see that it came from:
Received: from aserver.experts-exchange.com (localhost [127.0.0.1])
And was handed off to a mailing program that was running on the same server:
by aserver.experts-exchange.com (8.14.3/8.14.3) with ESMTP id n9JF1DmL003430
Now we look at the next header:
Received: from aserver.experts-exchange.com (xx.xxx.xxx.164)
by myserver.com with SMTP; 19 Oct 2009 15:01:14 -0000
This just says that the aserver.experts-exchange.com server handed the message to myserver.com within 1 second (the 15:01:14 -0000 timestamp = 08:01:14 -0700, and the message was originally send on 08:01:13 -0700).
That’s how you can see what servers were involved in your message’s delivery.
3. Other Mail Headers
There are a variety of other headers that can contain some information about the message. Some headers are almost always there, like “To”, “From”, “Subject”, and “Date”, while other headers might come from the mail server or from the email program used. Many email programs like to advertise themselves by adding an email header that says that the email was written using that program. Some mail servers might check a message for spam and viruses and add a header to indicate that a message is clean (or maybe one to say it’s not clean). The important thing to know is that headers are all optional. If you leave off the “Date” header, the mail server will often add it in. Still, headers can sometimes be useful to debugging mail problems.
4. Bounces
You can send a message to bittyboombam79639763976@gmail.com, but that doesn’t mean that it’s a valid recipient. Still, if you send a message there, it will make its way over to gmail.com’s mail servers, who will then check to see if bittyboombam79639763976 is actually a real recipient. If it’s not a valid address, or maybe if their mailbox is full, then gmail.com’s mail servers will sometimes generate a “bounce” or an NDR (Non-Delivery Report) that says that it could not deliver your message to that address, and it will usually give you the reason.
One side note to this is that with all the spam tricks that go on nowadays with fake “from” addresses, some mail servers aren’t sending bounces anymore. If it’s not a valid recipient, they might just delete the message and not tell you anything. It really depends on the mail servers (or who set them up).
5. Direct Mailing
Remember the real-life explanation of post office mail above? Imagine if you didn’t drop your letter off at your local post office in FromTown, VA, but INSTEAD jumped into your car and took a long road trip down to ToVille, TX. Once you got there, you handed the envelope to the post office right there. It’s definitely possible to do, and you would have a few benefits (faster delivery, for example), but it may not be worth the drive.
However, in the email world, the “road trip” can just be a matter of milliseconds. All of the tools needed to look up the right mail server and connect directly to it are freely available, so this all makes it nearly trivial to directly connect to the recipient’s mail server and send your message(s). This is a tactic of many mass mailer programs, and can be useful because having a direct connection to the “final” server can also give you immediate notification of failed and/or successful delivery.
On the flip side, direct mailing is most often used by spammers who don’t want to risk their illegitimate emails being blocked by a single server. As a result, most major internet service providers will block you from using direct mailing methods. They force you to send all your email through their own servers (and if you begin spamming, they can easily flip a switch and block all of your messages from going out). Usually those same providers will have a “business” connection that is more expensive, but allows you to do direct mailing.
All that said, even messages sent from “business” connections can still be easily flagged as spam by many antispam programs. A mail administrator should know the technical steps (things like properly-configured, reverse DNS) to take in order to allow mail to be sent without getting flagged as direct mail spam, so don’t plan on doing any mass mailings until you’ve taken those proper steps, and don’t try to use cheap, mass-mailing programs to avoid costs associated with commercial email campaigns.
6. Fake “From” Addresses
While we’re on the topic of spam, there’s one thing that almost everyone notices when they learn how to manually send e-mail using SMTP (something you’ll learn a bit further down in this article). Part of the SMTP protocol involves you telling the mail server what your email address is (the “from” address). There’s USUALLY no validation done on this to make sure that you’ve typed in your own e-mail address. This means that you could type in any valid e-mail address as if it were your own. Likewise, someone else could easily send an email pretending to be from you by filling in YOUR email address as the “from” address. This is why so much spam seems to come from bogus e-mail addresses or even from an email address on your own domain!
Luckily, there are some antispam measures out there that can do a bit of detective work to find out which “from” addresses are real or fake. It’s not perfect, but it does cut down on spam, and it’s getting better every day. There’s also more and more mail servers that require authentication (username and password) to be able to send e-mail out, which further helps control fake sender addresses. Ultimately, using a “fake” from address is never a good idea if you actually want your mail to be delivered, but it’s good to know WHY you shouldn’t do something.
7. Security Concerns
One downside to SMTP is that it is not a secure protocol. From the moment you press Send, your message is being transferred from computer-to-computer, and every computer that is involved in the delivery process has the ability to see your entire email. Anyone that runs any of those servers can flip a switch and start saving copies of all the email, and then read through them later, so be VERY careful of what you send through email. Never send any information that you wouldn’t be willing to give to a complete stranger (because technically that’s exactly what you’re doing).
There are some ways of addressing this problem. You can encrypt your emails using a variety of methods, the most popular methods being PGP (Pretty Good Privacy, http://www.pgp.com/) and GPG (GNU Privacy Guard,http://www.gnupg.org/). The former is a commercial solution, the latter is an open-source, free alternative, and both are usually well-supported by most popular e-mail programs.
If you don’t have either of those installed but need to send something securely, the best alternative is to save the sensitive information into a text file, then put that text file into an encrypted, password-protected ZIP file, and simply attach the ZIP file. This usually works, but due to the increasing number of viruses being spread through password-protected ZIP files, the email might just get blocked entirely. If that’s the case, post the ZIP file onto a web server somewhere and send the URL to the recipient. Once they’ve downloaded the ZIP file, delete it from the web server. Never send the password along with the ZIP file – always try to communicate the password by phone or via some other method. Otherwise, it’s like locking up your valuables in a safe, and then taping a piece of paper with the combination to the safe.
8. Technical Trivia
The standard network port for sending mail to mail servers (using SMTP) is port 25. There are also ways of running secure versions of SMTP, so the process of sending your message to the server is secured. The standard port for secure SMTP is port 465. NOTE: connecting securely to a mail server only encrypts the original transmission of the message from your computer to your mail server – it does not actually encrypt the message itself, nor does it encrypt any further transmissions.
9. Message Structure – Headers and Body
A basic email message is very straightforward. The message is split into two parts – all of the headers and all of the body. Headers have the form of a name followed by a colon and a space, and then a value for that header. For example, your basic subject line header looks like this:
Subject: Hello world!
The message is split into pieces by the use of a single blank line. So all of the data leading up to that single blank line is considered headers, and all of the data AFTER that single blank line is considered the message body. Pretty simple stuff.
10. Complex Messages – Attachments, Text and HTML, and Mimes, Oh My!
Basic, text-only emails were good for a while. Then someone figured out that attachments could be extremely useful. The problem was that attachments were made up of all sorts of binary data, and that it was hard to put good ol’ raw data straight into an email message without potentially messing up the message itself. So the first step to solving that problem was to turn the nasty raw data (all that gibberish) into an email-friendly format. Now fast-forward to today, where the most common method of doing that task is to use something called Base64. Without getting into specifics, encoding raw data using Base64 created a big block of letters and numbers that looks like:
U01UUA0KDQpUaGUgbW9zdCBjb21tb24gbWlzdGFrZXMgSSBoZW FyIG9==
…although usually much longer. The important thing was that none of the characters were gibberish – they were all letters and numbers and characters that you could easily find in a normal message. This meant that the data could be transmitted along with an email message without any trouble, and the recipient could use a Base64-decoder to turn all those letters and numbers back into the original, raw data again.
Of course, imagine if everytime you got an email, you had to copy all that Base64-encoded data, put it into a decoder, and then run it to produce the original file again! People would never put up with having to do that! As a result, MIME (Multipurpose Internet Mail Extensions) was born. MIME is really just an add-on to the normal structure of an email. Instead of the simple header/body structure, a MIME message is simply a normal email message where the body has been divided into different pieces.
Imagine a big freeway that just has one big lane. That big lane is your message body. Turning a message into a MIME message is the equivalent of painting some more lines on the freeway to create more lanes. You’re still using the same big block of concrete, but the lanes now organize all the traffic a bit better, and now you can even specify that some lanes hold special traffic. For example, you can put a sign over one lane to designate it as an HOV / carpool lane, and another lane as a truck-only lane.
MIME is exactly the same – you’re taking the body part of the mesage and splitting it up into different pieces using boundaries. Each piece can be designated to hold certain types of content. One piece might hold the text version of a message, while another piece holds the HTML version of the message, and yet another piece holds the Base64-encoded data of an attachment. You can have as many pieces as you want, and each piece can hold any type of data. It’s up to the email reader to know what to do with each piece.
By using MIME and specifying certain pieces of the body as Base64-encoded data, an email-reading program can now automatically take that data and decode it for you, and then present the file to you to make it easy to work with attachments.
Presto – the problem of sending files along with emails is solved using a combination of “friendly” encoding, and simple boundaries to tell the email program what part of the body is an encoded attachment.
You may have assumed this already, but HTML can also be a piece of the MIME message, and a good e-mail reader can show you the nice HTML version of the message. 

No comments:

Post a Comment