Base64 encoding explained

Base64 is a binary-to-text encoding that maps arbitrary bytes to 64 printable ASCII characters. Take any binary blob -- encrypted payloads, images, certificate data -- and re-encode it so every byte in the output is a harmless printable character. The trade-off is size: three input bytes become four output characters, a ~33% expansion1.

It sounds almost too simple to matter. And yet email attachments, data URIs, JWT tokens, HTTP Basic Auth, and PEM certificates all depend on it.

Origins

The encoding first showed up in RFC 989, published February 1987 by John Linn at BBN Corp2. The problem was Privacy Enhanced Mail (PEM) -- a scheme for encrypting and signing email. Mail transfer agents of the 1980s could only reliably handle 7-bit ASCII, so any binary data (ciphertext, signatures) would get mangled in transit. Linn's fix mapped every 6 bits of input to one of 64 safe characters.

PEM went through several revisions -- RFC 1040 (1988), RFC 1113 (1989), RFC 1421 (February 1993)3 -- but the core encoding barely changed. When MIME arrived in November 1996 via RFC 2045 by Ned Freed and Nathaniel Borenstein, it adopted Base64 as a content transfer encoding and brought it to mainstream email4. That's the version most developers encounter first.

How encoding works

The algorithm processes input in groups of three bytes -- 24 bits total. Those 24 bits split into four 6-bit groups, and each group (a value from 0 to 63) indexes into the Base64 alphabet1:

  • Values 0--25: A through Z
  • Values 26--51: a through z
  • Values 52--61: 0 through 9
  • Value 62: +
  • Value 63: /

Plus = for padding.

Base64 alphabet: values 0 through 63 mapped to ASCII charactersThe complete Base64 alphabet from RFC 4648

The alphabet was picked so that every character is printable, safe for 7-bit transport, and identical across all versions of ISO 646 and EBCDIC4.

Base64 encoding process: three ASCII bytes split into four 6-bit groups, each mapped to a characterEncoding the string 'Man' into Base64

Take "Man". ASCII values 77, 97, 110 give us 01001101 01100001 01101110 -- 24 bits. Split into four 6-bit chunks: 010011 (19), 010110 (22), 000101 (5), 101110 (46). Look each up: T, W, F, u. So "Man" becomes TWFu.

Padding

When input length isn't a multiple of three, the encoder pads with zeros and appends = characters. Two remaining bytes produce three Base64 characters plus one =. One remaining byte produces two characters plus ==1.

Base64 padding rules: how trailing bytes determine padding charactersPadding depends on how many bytes remain in the last group

"Ma" (two bytes) encodes to TWE=. "M" (one byte) encodes to TQ==. The padding tells the decoder exactly how many real bytes to expect at the tail end -- though some implementations (notably JWT) drop it entirely, since the output length already implies the answer.

The 33% overhead

Every 3 input bytes produce 4 output bytes: a ratio of 4/3, roughly 33.3% expansion4. A 1 MB file becomes about 1.33 MB. Not great, but the point was never efficiency -- it's compatibility with text-only channels.

Hex encoding (Base16) doubles the size. Base32 sits at ~60%. So Base64 is actually the most space-efficient of the standard binary-to-text schemes defined in RFC 46481.

Where it shows up

Email (MIME) -- The original use case. RFC 2045 specifies Base64 as a Content-Transfer-Encoding, with output split into lines of at most 76 characters4. When you attach an image, your mail client Base64-encodes the binary and embeds it right in the message body.

Data URIs -- RFC 2397, published by Larry Masinter of Xerox in August 1998, defines the data: URI scheme for embedding small files inline in HTML or CSS5. A tiny icon might look like data:image/png;base64,iVBORw0KGgo.... Handy for saving HTTP requests on tiny assets, though it bloats the HTML and bypasses the browser cache.

HTTP Basic Auth -- The client concatenates username, colon, password, and Base64-encodes the result into the Authorization header6. Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ== decodes to Aladdin:open sesame. I should stress -- this is encoding, not encryption. Anyone intercepting the header decodes it trivially. Only makes sense over HTTPS.

JWT tokens -- JSON Web Tokens use a URL-safe variant (see below) to encode the header and payload as three dot-separated base64url segments7. The JOSE specs, starting with RFC 7515 for JWS, formalized this8.

PEM certificates -- Those -----BEGIN CERTIFICATE----- blocks in TLS configs are Base64-encoded DER data, wrapped in 64-character lines. The format traces back to the PEM specs from the late 1980s3, even though nobody uses PEM for actual email anymore. RFC 7468 eventually formalized the textual encoding rules that most implementations had been following informally for years9.

Browser APIs -- In JavaScript, btoa() encodes a binary string to Base64 and atob() decodes it back. The names are unintuitive (binary-to-ASCII and ASCII-to-binary), and they only handle Latin-1 characters -- for UTF-8, you need TextEncoder first10. Newer APIs like Uint8Array.fromBase64() are starting to appear.

The URL-safe variant

Standard Base64 uses + and / as characters 62 and 63. Both are problematic in URLs: + sometimes means a space in query strings, and / is a path separator. RFC 4648 defines base64url, which swaps + for - and / for _1. Some implementations also drop the = padding -- RFC 7515 (JWS) explicitly requires base64url with no padding8.

CharacterStandardURL-safe
Index 62+-
Index 63/_
Padding= (required)= (often omitted)

Same algorithm, same 6-bit grouping, same overhead. Just two characters swapped.

Base64 vs. its siblings

RFC 4648 standardizes three related schemes1:

Comparison of Base16, Base32, and Base64 encoding overhead and use casesThe three binary-to-text encodings defined in RFC 4648

Base16 is plain hex -- you see it in hash digests, MAC addresses, CSS color codes. Case-insensitive, which is convenient, but the 100% size penalty hurts for anything beyond short values.

Base32 uses A-Z and 2-7, deliberately avoiding characters that humans confuse (0/O, 1/l/I, 8/B). That makes it better for things people actually type, like TOTP secrets or Tor onion addresses.

Base64 wins on efficiency. In machine-to-machine contexts -- which is most contexts -- the 33% overhead beats 60% or 100%.

Not encryption

Base64 isn't encryption, and I keep seeing people treat it as though it were. The encoding is entirely reversible. No key, no secret, no cryptographic operation. SGVsbG8gV29ybGQ= is just "Hello World" -- any developer decodes it in seconds.

The confusion probably comes from the fact that Base64 output looks like random gibberish. It isn't. It's a format transformation, nothing more.

Citations

  1. RFC 4648: The Base16, Base32, and Base64 Data Encodings. S. Josefsson, October 2006 ↩ ↩2 ↩3 ↩4 ↩5 ↩6

  2. RFC 989: Privacy Enhancement for Internet Electronic Mail: Part I. J. Linn, February 1987 ↩

  3. RFC 1421: Privacy Enhancement for Internet Electronic Mail: Part I. J. Linn, February 1993 ↩ ↩2

  4. RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One. N. Freed, N. Borenstein, November 1996 ↩ ↩2 ↩3 ↩4

  5. RFC 2397: The "data" URL scheme. L. Masinter, August 1998 ↩

  6. RFC 7617: The 'Basic' HTTP Authentication Scheme. J. Reschke, September 2015 ↩

  7. RFC 7519: JSON Web Token (JWT). M. Jones, J. Bradley, N. Sakimura, May 2015 ↩

  8. RFC 7515: JSON Web Signature (JWS). M. Jones, J. Bradley, N. Sakimura, May 2015 ↩ ↩2

  9. RFC 7468: Textual Encodings of PKIX, PKCS, and CMS Structures. S. Josefsson, S. Leonard, April 2015 ↩

  10. MDN Web Docs: Window: btoa() method. Retrieved March 16, 2026 ↩

Updated: March 16, 2026