The Basics and Principles of the Base64 Algorithm
Introduction
Base64 is a collection of binary-to-text encoding schemes that are utilized to represent binary data in an ASCII string format through the translation of the data into a radix-64 representation.
It uses the following alphabet to represent the radix-64 digits, alongside “=
” as a padding character
A-Z, a-z, 0-9, +, /
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
In URLs, certain characters such as /, ?, and # have special meanings. To optimize the encoding of binary data in URLs, URL Base64 encoding replaces the characters + and / with – and _ to avoid characters that might cause problems in URL path segments or query parameters.
Base64 encoding schemes are commonly used to encode binary data for storage or transfer over media that can only deal with ASCII text.
Common applications of Base64 include:
- Email via MIME
- Storing complex data in XML
- Encoding binary data so it can be included in a
data:
URL
Algorithm
In Base64 encoding, each character is represented using only 6 bits.
Why 64?
2 ^ 6 = 64
The Relationship between Base64 Index and Corresponding Characters
Character | Base64 Index |
A-Z | 0-25 |
a-z | 26-51 |
0-9 | 52-61 |
+ | 62 |
/ | 63 |
It is known that one character in ASCII code occupies 1 byte (8 bits) of storage space.
According to the ASCII code, the characters with decimal values ranging from 0 to 31 and 127 are control characters, while the characters with decimal values ranging from 32 to 126 are printable characters. This means that only these 95 characters can be transmitted over a network, and any characters outside this range cannot be transmitted.
So how can other characters be transmitted? One way to achieve this is by using Base64 encoding.
While one Base64 character occupies 6 bits, ASCII code occupies 8 bits. Therefore, a method is needed to represent 8-bit data using 6 bits.
3*8 bit = 4*6 bit
Each Base64 digit represents 6 bits of data. So, three 8-bit bytes of the input string/binary file (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
Example 1
Assuming the content we want to encode is “China”. Referring to the ASCII code table, the corresponding binary data for “China” is:
01000011 01101000 01101001 01101110 01100001
The encoding process is as follows:
1. The original data is encoded by grouping every 3 bytes together. This results in a total of 3*8=24 bits. Any remaining bytes that are less than 3 are placed in a separate group.
2. The 24 bytes are divided into 4 groups, with each group consisting of 6 bits. Any remaining bits are padded with zeros at the end.
3. Two additional zero bits are inserted before each group of 6 bits, resulting in a total of 32 bits.
4. The remaining part, which is less than 4 bytes, is padded with “0”. If a group is entirely filled with 0, it is represented by the “=” symbol. By referring to the Base64 code table, the resulting encoded data is: Q2hpbmE=
Based on the aforementioned analysis, it can be observed that if the final group is less than 32 bits, Base64 encoding will result in the conversion of 3 bytes of data into 4 bytes, increasing the size by approximately 4/3 times.
Example 2
S | o | n | ||
ASCII | 83 | 111 | 110 | |
Binary | 01010011 | 01101111 | 01101110 | |
6 bits per group | 010100 | 110110 | 111101 | 101110 |
Padding with leading zeros | 00010100 | 00110110 | 00111101 | 00101110 |
Base64 Index | 20 | 54 | 61 | 46 |
Encoded data | U | 2 | 9 | u |
In the given scenario, the string “Son” is encoded using Base64 and results in the string “U29u”. This is an example where characters are precisely converted into four corresponding Base64 characters.
S | ||||
ASCII | 83 | |||
Binary | 01010011 | |||
6 bits per group | 010100 | 110000 | 000000 | 000000 |
Padding with leading zeros | 00010100 | 00110110 | 00000000 | 00101110 |
Base64 Index | 20 | 48 | ||
Encoded data | U | w | = | = |
The resulting encoded data is: Uw==
Advantages
- The advantage of using Base64 encoding is that it reduces the number of HTTP requests and eliminates cross-origin issues
Disadvantages
- The increase in file size can lead to the blocking of HTML and CSS parsing when loading base64 images. However, external linked images can continue to load after the completion of page rendering without causing any blocking.