Base64 encoding explained

Base64 is a binary-to-text encoding scheme that represents arbitrary binary data using a set of 64 printable ASCII characters. The core idea is simple: take binary data that might contain bytes hostile to text-based protocols (null bytes, control characters, anything above 127), and re-encode it so every byte in the output is a safe, printable character. The cost is a ~33% increase in size -- three input bytes become four output characters1.

It's one of those things that sounds trivial until you realize just how many systems depend on it. Email attachments, data URIs in HTML, JWT tokens, HTTP Basic Authentication headers, PEM certificates -- Base64 is the quiet workhorse behind all of them.

Where it came from

The earliest standardized version of what we now call Base64 appeared in RFC 989, published in February 1987 by John Linn of BBN Corp2. That spec dealt with Privacy Enhanced Mail (PEM) -- a way to encrypt and authenticate email messages. The problem PEM had to solve was that email systems in the 1980s could only reliably transport 7-bit ASCII text. Binary data (encrypted message bodies, digital signatures) would get mangled or stripped by mail transfer agents along the way.

Linn's solution was an encoding that mapped every 6 bits of input to one of 64 ASCII characters. The PEM specs went through several revisions -- RFC 1040 in 1988, RFC 1113 in 1989, and finally RFC 1421 in February 19933 -- but the Base64 encoding itself remained largely the same.

When the MIME standard arrived in November 1996 (RFC 2045 by Ned Freed and Nathaniel Borenstein), it adopted Base64 as one of its content transfer encodings and brought it to mainstream email4. That's the version most people encounter first -- the one that lets you attach a PDF to an email.

How the algorithm works

The encoding operates on groups of three bytes at a time. Three bytes is 24 bits. Those 24 bits get split into four 6-bit groups. Each 6-bit group (a value from 0 to 63) is then used as an index into the Base64 alphabet:

  • Values 0--25: A through Z
  • Values 26--51: a through z
  • Values 52--61: 0 through 9
  • Value 62: +
  • Value 63: /

That's 64 characters total, plus = for padding1.

Base64 encoding process: three ASCII bytes split into four 6-bit groups, each mapped to a characterEncoding the string 'Man' into Base64

Take the string "Man". The ASCII values are 77, 97, 110. In binary that's 01001101 01100001 01101110 -- 24 bits. Split those into four 6-bit chunks: 010011 (19), 010110 (22), 000101 (5), 101110 (46). Look each one up in the alphabet: index 19 is T, 22 is W, 5 is F, 46 is u. So "Man" encodes to TWFu.

Padding

Things get interesting when the input length isn't a multiple of three. If there's one byte left over (8 bits), the encoder pads it with zeros to 12 bits, producing two Base64 characters, then appends ==. If two bytes are left (16 bits), pad to 18 bits, output three characters, append =1.

So "Ma" (two bytes) encodes to TWE=, and "M" (one byte) encodes to TQ==. The padding tells the decoder exactly how many bytes to expect at the tail end.

The 33% overhead

The math is straightforward: every 3 bytes of input produce 4 bytes of output. That's a ratio of 4/3, or roughly 33.3% expansion4. For a 1 MB file, expect about 1.33 MB of Base64 text. It's not great, but it's not terrible either -- the entire point is compatibility, not efficiency.

Hex encoding (Base16), by comparison, doubles the size -- every byte becomes two hex characters1. Base32 sits in between, with a 60% overhead. So Base64 is actually the most space-efficient of the common binary-to-text schemes.

Where Base64 shows up

Email (MIME)

This was the original killer app. RFC 2045 specifies Base64 as a Content-Transfer-Encoding for MIME message bodies4. When you attach an image to an email, your client Base64-encodes the binary file, splits it into lines of no more than 76 characters (a requirement from the MIME spec), and embeds it in the message body. The recipient's client reverses the process.

Data URIs

RFC 2397 defines the data: URI scheme, which lets you embed small files directly into HTML or CSS5. A tiny icon as a data URI looks like:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...

Larry Masinter of Xerox published this spec in August 1998. It's handy for reducing HTTP requests on small assets, though you shouldn't go overboard -- large Base64 strings in HTML bloat the document and can't be cached independently.

HTTP Basic Authentication

When a server demands Basic auth, the client concatenates the username, a colon, and the password, then Base64-encodes the result and sends it in the Authorization header6:

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

That decodes to Aladdin:open sesame. I should stress that this is encoding, not encryption -- anyone who intercepts the header can trivially decode it. Basic auth only makes sense over HTTPS.

JWT tokens

JSON Web Tokens use a URL-safe variant of Base64 (more on that below) to encode the header and payload. A JWT is three base64url-encoded segments separated by dots7. The JOSE (JSON Object Signing and Encryption) family of specs, starting with RFC 7515 for JWS, formalized the base64url encoding rules for this use case8.

PEM certificates

Those -----BEGIN CERTIFICATE----- blocks you see in TLS configuration? That's Base64-encoded DER data, wrapped in 64-character lines, with PEM headers and footers. The format traces directly back to the Privacy Enhanced Mail specs from the late 1980s3, even though hardly anyone uses PEM for actual email anymore.

The URL-safe variant

Standard Base64 uses + and / as its 62nd and 63rd characters. Both of those have special meaning in URLs and filenames -- + is sometimes interpreted as a space in query strings, and / is a path separator. This makes standard Base64 output unsafe to embed directly in a URL without percent-encoding.

RFC 4648 defines base64url, which swaps + for - (minus) and / for _ (underscore)1. Some implementations also omit the = padding, since the encoded length already implies how many padding characters there would be. RFC 7515 (JWS) explicitly specifies base64url with no padding8.

CharacterStandard Base64URL-safe Base64
Index 62+-
Index 63/_
Padding= (required)= (often omitted)

The two variants are otherwise identical -- same algorithm, same 6-bit grouping, same 33% overhead.

Base64 vs. its siblings

RFC 4648 actually standardizes three related encodings1:

EncodingCharacters per byteAlphabet sizeOverheadPadding
Base16 (hex)216 (0-9, A-F)100%None needed
Base321.632 (A-Z, 2-7)~60%=
Base641.3364 (A-Z, a-z, 0-9, +, /)~33%=

Base16 is just hexadecimal. You see it in hash digests, MAC addresses, color codes. It's case-insensitive, which is nice, but the 100% size penalty is rough for large payloads.

Base32 shows up in places like TOTP codes (those 6-digit authenticator app secrets are usually Base32-encoded) and Tor onion addresses. It avoids easily confused characters (no 0, 1, 8, or lowercase), which makes it better for human-readable contexts.

Base64 wins on efficiency and dominates everywhere that human readability doesn't matter -- which, in machine-to-machine communication, is most places.

A common misconception

Base64 is not encryption. I keep seeing it treated as though encoding credentials or tokens in Base64 provides some kind of security. It doesn't. The encoding is entirely reversible by anyone with access to the data. There's no key, no secret, no cryptographic operation involved. It's a format transformation, nothing more.

The confusion probably stems from the fact that Base64 strings look like random gibberish to the untrained eye. But SGVsbG8gV29ybGQ= is just "Hello World" -- and any developer can decode it in about two seconds.

Citations

  1. RFC 4648: The Base16, Base32, and Base64 Data Encodings. S. Josefsson, October 2006 ↩ ↩2 ↩3 ↩4 ↩5 ↩6

  2. RFC 989: Privacy Enhancement for Internet Electronic Mail: Part I. J. Linn, February 1987 ↩

  3. RFC 1421: Privacy Enhancement for Internet Electronic Mail: Part I. J. Linn, February 1993 ↩ ↩2

  4. RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One. N. Freed, N. Borenstein, November 1996 ↩ ↩2 ↩3

  5. RFC 2397: The "data" URL scheme. L. Masinter, August 1998 ↩

  6. RFC 7617: The 'Basic' HTTP Authentication Scheme. J. Reschke, September 2015 ↩

  7. RFC 7519: JSON Web Token (JWT). M. Jones, J. Bradley, N. Sakimura, May 2015 ↩

  8. RFC 7515: JSON Web Signature (JWS). M. Jones, J. Bradley, N. Sakimura, May 2015 ↩ ↩2

Updated: March 4, 2026