Ever typed out a perfect 150-character SMS, added a single emoji, and suddenly your message became two segments? That’s the difference between GSM-7 and Unicode encoding—and understanding it can save you significant money on SMS campaigns.
This guide explains exactly why a single emoji can double your messaging costs and how to optimize your texts for maximum efficiency.
The Short Answer
| Encoding | Characters per Single SMS | Characters per Multi-part SMS |
|---|---|---|
| GSM-7 | 160 characters | 153 characters per segment |
| Unicode (UCS-2) | 70 characters | 67 characters per segment |
One emoji = Unicode = 70 character limit instead of 160.
That’s a 56% reduction in available space—from a single character.
Test Your Message Encoding Instantly
Not sure if your message uses GSM-7 or Unicode? Our free tool detects encoding automatically and shows exactly how many segments your message will use.
What is GSM-7 Encoding?
GSM-7 (Global System for Mobile Communications 7-bit) is the original character encoding standard for SMS, developed in the 1980s. It uses 7 bits per character, allowing 160 characters in a single 140-byte SMS payload.
The GSM-7 Character Set
GSM-7 supports exactly 128 standard characters plus 10 extended characters:
Standard Characters (7 bits each):
A-Z a-z 0-9
@ £ $ ¥ è é ù ì ò Ç Ø ø Å å
Δ _ Φ Γ Λ Ω Π Ψ Σ Θ Ξ
Æ æ ß É
! " # ¤ % & ' ( ) * + , - . /
: ; < = > ? ¡ ¿
Space, newline, carriage return
Extended Characters (14 bits = 2 character spaces):
^ { } \ [ ~ ] | € (form feed)
Why 160 Characters?
The math is elegantly simple:
- SMS payload = 140 bytes (1,120 bits)
- GSM-7 = 7 bits per character
- 1,120 ÷ 7 = 160 characters
This is why 160 became the magic number for SMS length.
What is Unicode (UCS-2) Encoding?
When your message contains characters outside the GSM-7 set, the SMS system automatically switches to Unicode encoding (specifically UCS-2, a subset of UTF-16).
Unicode uses 16 bits per character—more than double GSM-7’s 7 bits.
The Math Behind 70 Characters
- SMS payload = 140 bytes (1,120 bits)
- Unicode = 16 bits per character
- 1,120 ÷ 16 = 70 characters
That’s it. The same physical SMS capacity, but each character takes more than twice the space.
What Triggers Unicode Encoding?
A single non-GSM character forces the entire message into Unicode mode. Common triggers include:
Emojis (Most Common)
😀 😎 🎉 ❤️ 👍 🔥 ✨ 🚀
Every emoji requires Unicode. There’s no workaround.
Non-Latin Alphabets
Chinese: 你好
Arabic: مرحبا
Russian: Привет
Greek: Γειά σου
Hebrew: שלום
Japanese: こんにちは
Korean: 안녕하세요
Special Symbols
™ © ® • … — – ″ ″
Smart quotes: " " ' '
Mathematical: × ÷ ≠ ≈ ∞
Currency: ₹ ₽ ₿ ¥ (¥ is GSM, but ₹ is not)
Accented Characters NOT in GSM
While GSM-7 includes some European characters (é, è, ù, ì, ò, Ç, Ø, ø, Å, å, Ä, Ö, Ñ, Ü, ä, ö, ñ, ü, à), many accented characters require Unicode:
GSM-7: é è ù ì ò à ä ö ñ ü
Unicode required: ê ë î ï ô û ā ē ī ō ū ă ş ț
Multi-Part (Concatenated) SMS
When your message exceeds the single-SMS limit, it’s split into multiple segments. But here’s the catch: each segment loses characters to a header.
How Concatenation Works
┌─────────────────────────────────┐
│ Segment 1 │
│ [Header: 7 bytes] [Message...] │
├─────────────────────────────────┤
│ Segment 2 │
│ [Header: 7 bytes] [Message...] │
├─────────────────────────────────┤
│ Segment 3 │
│ [Header: 7 bytes] [Message...] │
└─────────────────────────────────┘
The User Data Header (UDH) in each segment contains:
- Reference number (to reassemble the message)
- Total number of segments
- Current segment number
This header consumes 7 bytes per segment, reducing available characters:
| Encoding | Single SMS | Per Segment (Multi-part) |
|---|---|---|
| GSM-7 | 160 chars | 153 chars |
| Unicode | 70 chars | 67 chars |
Real-World Impact
A 161-character GSM message:
- Requires 2 segments
- Uses 161 of available 306 characters (153 × 2)
- You pay for 2 SMS credits
A 71-character Unicode message:
- Requires 2 segments
- Uses 71 of available 134 characters (67 × 2)
- You pay for 2 SMS credits
The Extended GSM Characters Trap
Ten characters are technically part of GSM-7 but count as two characters because they use an escape sequence:
^ { } \ [ ~ ] | € (and form feed)
Example
"Price: €50"
= 10 characters visually
= 11 characters for SMS counting (€ = 2)
This catches many people off guard, especially with the euro symbol (€).
Cost Implications
Understanding encoding directly impacts your SMS budget.
Scenario: 10,000 Message Campaign
Message A (GSM-7, 155 chars):
FLASH SALE! Get 25% off all items today only.
Use code SAVE25 at checkout.
Shop now: example.com/sale
Reply STOP to opt out.
- Encoding: GSM-7
- Segments: 1
- Total SMS credits: 10,000
Message B (Same content + emoji, 156 chars):
🎉 FLASH SALE! Get 25% off all items today only.
Use code SAVE25 at checkout.
Shop now: example.com/sale
Reply STOP to opt out.
- Encoding: Unicode (due to 🎉)
- Characters: 156 > 70, so needs 3 segments (156 ÷ 67 = 2.33 → 3)
- Total SMS credits: 30,000
One emoji tripled the cost.
How to Optimize Your SMS Messages
1. Avoid Emojis in Business SMS
Yes, emojis add personality. But for transactional or bulk messaging, the cost often outweighs the benefit.
Instead of: Your order shipped! 📦
Use: Your order shipped!
2. Watch for Smart Quotes
Word processors and some phones auto-convert quotes:
Unicode (bad): "Hello" (curly quotes)
GSM-7 (good): "Hello" (straight quotes)
3. Use GSM-7 Equivalents
| Unicode | GSM-7 Alternative |
|---|---|
| — (em dash) | - (hyphen) |
| … (ellipsis) | … (three periods) |
| × (multiplication) | x (letter x) |
| ” ” (smart quotes) | ” (straight quote) |
4. Test Before Sending
Always verify encoding before launching campaigns:
Check your message with our free SMS Calculator →
5. Consider Your Audience
If messaging users who primarily speak languages requiring Unicode (Chinese, Arabic, etc.), you can’t avoid Unicode. Factor the higher per-message cost into your budget.
When Unicode is Worth It
Despite the cost, Unicode is sometimes the right choice:
- International audiences: Communicating in native languages builds trust
- Brand differentiation: A well-placed emoji can increase engagement significantly
- Character-limited contexts: Sometimes Unicode symbols convey meaning more efficiently than words
The Engagement vs. Cost Tradeoff
Studies show emojis can increase engagement by 25-30%. If your message generates enough additional conversions, the extra cost may be justified.
Calculate your break-even:
If Unicode costs 3x more (3 segments vs 1):
Engagement must increase by 200% to break even
OR conversion value must be high enough to absorb the cost
Technical Deep Dive: How Detection Works
SMS systems automatically detect encoding using a simple algorithm:
function detectEncoding(message):
for each character in message:
if character NOT in GSM_7_SET:
return UNICODE
return GSM_7
The check happens before transmission. There’s no “partial Unicode”—it’s all or nothing.
The GSM-7 Detection Set
For developers implementing SMS systems, here’s the exact character set to check against:
const GSM_7_BASIC =
'@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ ' +
'!"#¤%&\'()*+,-./0123456789:;<=>?' +
'¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ' +
'¿abcdefghijklmnopqrstuvwxyzäöñüà';
const GSM_7_EXTENDED = '^{}\\[~]|€';
function isGSM7(text) {
for (const char of text) {
if (!GSM_7_BASIC.includes(char) &&
!GSM_7_EXTENDED.includes(char)) {
return false;
}
}
return true;
}
Quick Reference Card
| Question | GSM-7 | Unicode |
|---|---|---|
| Max chars (single SMS) | 160 | 70 |
| Max chars (per segment) | 153 | 67 |
| Bits per character | 7 | 16 |
| Supports emojis? | No | Yes |
| Supports Latin alphabet? | Yes | Yes |
| Supports Chinese/Arabic/etc.? | No | Yes |
| Cost comparison | 1x | ~2-3x |
Don’t Guess—Test Your Message
Paste your SMS text into our calculator to instantly see:
- ✓ Character count (including extended chars)
- ✓ Encoding type (GSM-7 or Unicode)
- ✓ Number of segments
- ✓ Which characters trigger Unicode
Conclusion
The difference between GSM-7 and Unicode encoding is one of the most important—and frequently overlooked—aspects of SMS messaging. A single emoji or special character can more than double your costs overnight.
Key takeaways:
- GSM-7 = 160 characters; Unicode = 70 characters
- One non-GSM character forces the entire message to Unicode
- Extended GSM characters (€, {, }, etc.) count as 2 characters
- Multi-part messages lose 7 characters per segment to headers
- Always test your messages before bulk sending
Understanding these technical constraints isn’t just academic—it’s directly tied to your SMS budget and campaign effectiveness. Write smarter, spend less, and reach more customers with optimized messages.
WhatIsSMS.com
SMS Technology Guide