Python Strings — Master Text Processing in Python
Learning Objectives
By the end of this tutorial, you will be able to:
- Create strings using different syntaxes and understand escape sequences
- Use indexing and slicing to extract and manipulate substrings
- Apply all 40+ built-in string methods organized by category
- Format strings using %-formatting,
.format(), and f-strings - Understand string immutability and its implications
- Encode and decode strings for different character sets
- Avoid common string-related pitfalls
What Are Strings in Python?
A string is an immutable sequence of Unicode characters. Strings are one of the most frequently used data types in Python — you use them for text, file paths, network requests, data parsing, and much more.
# Creating a string
greeting = "Hello, World!"
print(type(greeting)) # <class 'str'>
print(len(greeting)) # 13
Single vs Double Quotes
Python treats single and double quotes identically. Choose whichever is more readable for your content:
# These are equivalent
name = "Alice"
name = 'Alice'
# Useful when the string contains one type of quote
sentence = "She said 'hello' to me."
question = 'What does "Python" mean?'
Triple Quotes for Multiline
Use triple quotes (""" or ''') for strings that span multiple lines. Whitespace is preserved:
poem = """
Roses are red,
Violets are blue,
Python is awesome,
And so are you.
"""
print(poem)
Creating Strings
Literal Syntax
empty = ""
single = 'Hello'
double = "Hello"
triple = """Multi
line"""
The str() Constructor
Convert other types to strings:
print(str(42)) # "42"
print(str(3.14)) # "3.14"
print(str(True)) # "True"
print(str(None)) # "None"
print(str([1, 2, 3])) # "[1, 2, 3]"
Escape Sequences
| Sequence | Character | Description |
|---|---|---|
\n | Newline | Line break |
\t | Tab | Horizontal tab |
\\ | Backslash | Literal backslash |
\' | Single quote | In single-quoted string |
\" | Double quote | In double-quoted string |
\r | Carriage return | Windows line ending |
\0 | Null | Null character |
\a | Bell | Alert/bell sound |
\b | Backspace | Delete previous character |
\f | Form feed | Page break |
# Escape sequences in action
print("Line one\nLine two")
# Line one
# Line two
print("Column1\tColumn2\tColumn3")
# Column1 Column2 Column3
print("Path: C:\\Users\\name\\file.txt")
# Path: C:\Users\name\file.txt
print("She said \"Hello!\"")
# She said "Hello!"
Raw Strings
Prefix a string with r to treat backslashes as literal characters. Essential for regex and Windows file paths:
# Regular string: \n is a newline
print("C:\new\file.txt")
# C:
# ew file.txt
# Raw string: \n is literal backslash + n
print(r"C:\new\file.txt")
# C:\new\file.txt
# Useful for regex patterns
import re
pattern = r"\d+\.\d+" # Match numbers like 3.14
String Indexing and Slicing
Index Positions
String: H e l l o
Index: 0 1 2 3 4
Neg: -5 -4 -3 -2 -1
text = "Hello"
# Positive indexing (left to right)
print(text[0]) # H
print(text[4]) # o
# Negative indexing (right to left)
print(text[-1]) # o
print(text[-5]) # H
Slicing Syntax
Extract substrings using [start:stop:step]:
- start: Inclusive (where to begin)
- stop: Exclusive (where to end — character at this index is NOT included)
- step: How many characters to skip
text = "Python3."
# Basic slicing
print(text[0:6]) # Python
print(text[2:5]) # thon
print(text[6:]) # 3.
print(text[:6]) # Python
# With step
print(text[::2]) # Pto. (every other character)
print(text[1::2]) # yhn3 (every other, starting at index 1)
# Negative slicing
print(text[-3:]) # 3.
print(text[:-3]) # Python
print(text[-5:-2]) # hon
# Reversing a string
print(text[::-1]) # .3nohtyP
Common Slicing Patterns
s = "abcdefghij" # length 10
# First 3 characters
print(s[:3]) # abc
# Last 3 characters
print(s[-3:]) # hij
# Middle portion
print(s[3:7]) # defg
# Skip every 3rd character
print(s[::3]) # adgj
# Reverse
print(s[::-1]) # jihgfedcba
String Immutability
Strings in Python are immutable — once created, you cannot change individual characters:
name = "Hello"
# name[0] = "J" # TypeError: 'str' object does not support item assignment
# Instead, create a new string
name = "J" + name[1:]
print(name) # Jello
Memory Implications
Each string operation creates a new string object. In tight loops, prefer join() over repeated concatenation:
# Inefficient — creates many intermediate strings
result = ""
for i in range(1000):
result += str(i) # New string created each iteration
# Efficient — builds list, then joins once
parts = []
for i in range(1000):
parts.append(str(i))
result = "".join(parts)
All String Methods — Complete Reference
Case Methods
text = "Hello, World!"
print(text.upper()) # HELLO, WORLD!
print(text.lower()) # hello, world!
print(text.title()) # Hello, World!
print(text.capitalize()) # Hello, world!
print(text.swapcase()) # hELLO, wORLD!
# casefold() — aggressive lowercase for caseless matching
print("Straße".casefold()) # strasse (German sharp s -> ss)
print("HELLO".casefold()) # hello
Search Methods
text = "Hello, World! Hello, Python!"
# find() — returns index or -1 if not found
print(text.find("Hello")) # 0
print(text.find("Hello", 1)) # 14 (start searching from index 1)
print(text.find("Java")) # -1
# rfind() — find from the right
print(text.rfind("Hello")) # 14
# index() — like find() but raises ValueError if not found
print(text.index("World")) # 7
# text.index("Java") # ValueError!
# rindex() — index from the right
print(text.rindex("Hello")) # 14
# count() — count occurrences
print(text.count("Hello")) # 2
print(text.count("l")) # 5
# startswith() and endswith()
print(text.startswith("Hello")) # True
print(text.endswith("Python!")) # True
# Can use a tuple of suffixes
print(text.endswith(("Python!", "World!"))) # True
# startswith with start/end range
print(text.startswith("Hello", 14)) # True
Transformation Methods
# Strip whitespace (or specified characters)
text = " Hello, World! "
print(text.strip()) # "Hello, World!"
print(text.lstrip()) # "Hello, World! "
print(text.rstrip()) # " Hello, World!"
# Strip specific characters
print("***Hello***".strip("*")) # "Hello"
print("xyzHelloxyz".strip("xyz")) # "Hello"
# replace() — substitute substrings
text = "Hello, World!"
print(text.replace("World", "Python")) # Hello, Python!
print(text.replace("l", "L", 2)) # HeLLo, World! (replace first 2 only)
# center(), ljust(), rjust() — padding
print("Hello".center(20, "-")) # -------Hello--------
print("Hello".ljust(20, ".")) # Hello...............
print("Hello".rjust(20, ".")) # ...............Hello
# zfill() — zero-pad numbers
print("42".zfill(5)) # 00042
print("-42".zfill(5)) # -0042
# expandtabs() — control tab stops
print("H\te\tl\tl\to".expandtabs(4))
# H e l l o
Split and Join Methods
# split() — split into list
text = "apple,banana,cherry"
print(text.split(",")) # ['apple', 'banana', 'cherry']
# split with limit
print("a-b-c-d".split("-", 2)) # ['a', 'b', 'c-d']
# rsplit() — split from the right
print("a-b-c-d".rsplit("-", 2)) # ['a-b', 'c', 'd']
# split() with no args — splits on any whitespace
text = "Hello World\t\tFoo"
print(text.split()) # ['Hello', 'World', 'Foo']
# splitlines() — split on line boundaries
text = "Line 1\nLine 2\nLine 3"
print(text.splitlines()) # ['Line 1', 'Line 2', 'Line 3']
# splitlines with keepends
print(text.splitlines(True)) # ['Line 1\n', 'Line 2\n', 'Line 3']
# join() — combine iterable of strings
words = ["Python", "is", "awesome"]
print(" ".join(words)) # Python is awesome
print(",".join(words)) # Python,is,awesome
print("\n".join(words)) # Python\nis\nawesome
Test Methods (Boolean Checks)
# Alphabetic
print("Hello".isalpha()) # True
print("Hello123".isalpha()) # False
print("".isalpha()) # False
# Digits
print("12345".isdigit()) # True
print("12.34".isdigit()) # False
print("½".isdigit()) # False
# Alphanumeric
print("Hello123".isalnum()) # True
print("Hello 123".isalnum()) # False (space is not alphanumeric)
# Whitespace
print(" \t\n".isspace()) # True
print(" ".isspace()) # True
print("".isspace()) # False
# Case checks
print("HELLO".isupper()) # True
print("hello".islower()) # True
print("Hello World".istitle()) # True
print("hello World".istitle()) # False
# Numeric (broader than isdigit — includes Roman numerals, fractions)
print("123".isnumeric()) # True
print("½".isnumeric()) # True
print("²".isnumeric()) # True
# Decimal (strict — only base-10 digits)
print("123".isdecimal()) # True
print("½".isdecimal()) # False
# Identifier (valid Python variable name?)
print("my_var".isidentifier()) # True
print("123var".isidentifier()) # False
print("_private".isidentifier()) # True
print("class".isidentifier()) # True (reserved word passes!)
# ASCII
print("Hello".isascii()) # True
print("Héllo".isascii()) # False
# Printable
print("Hello".isprintable()) # True
print("Hello\n".isprintable()) # False (newline is not printable)
Encoding Methods
# encode() — string to bytes
text = "Hello, World!"
ascii_bytes = text.encode("ascii")
utf8_bytes = text.encode("utf-8")
latin1_bytes = text.encode("latin-1")
print(ascii_bytes) # b'Hello, World!'
print(utf8_bytes) # b'Hello, World!'
print(type(ascii_bytes)) # <class 'bytes'>
# decode() — bytes to string
print(ascii_bytes.decode("ascii")) # Hello, World!
# Encoding with emoji (requires Unicode)
emoji = "Hello 🐍"
print(emoji.encode("utf-8")) # b'Hello \xf0\x9f\x90\x8d'
print(emoji.encode("utf-8").decode("utf-8")) # Hello 🐍
# Handle encoding errors
text = "Héllo Wörld"
print(text.encode("ascii", errors="replace")) # b'H?llo W?rld'
print(text.encode("ascii", errors="ignore")) # b'Hllo Wrld'
print(text.encode("ascii", errors="xmlcharrefreplace")) # b'Héllo Wörld'
Formatting Methods
# maketrans() and translate() — character-level translation
table = str.maketrans("aeiou", "12345")
text = "hello world"
print(text.translate(table)) # h2ll4 w4rld
# Multiple replacements
table = str.maketrans({"a": "A", "e": "E", "i": "I"})
print("apple".translate(table)) # ApplE
# format_map() — like format() but with a mapping
data = {"name": "Alice", "age": 30}
print("Hello, {name}! Age: {age}".format_map(data))
# Hello, Alice! Age: 30
# partition() — split into three parts
text = "hello=world=python"
print(text.partition("=")) # ('hello', '=', 'world=python')
print(text.rpartition("=")) # ('hello=world', '=', 'python')
# removeprefix() and removesuffix() (Python 3.9+)
print("HelloWorld".removeprefix("Hello")) # World
print("HelloWorld".removesuffix("World")) # Hello
print("test.py".removesuffix(".py")) # test
Additional String Methods
# swapcase() — swap case of each character
print("Hello World".swapcase()) # hELLO wORLD
# title() — capitalize first letter of each word
print("hello world".title()) # Hello World
# capitalize() — capitalize first character only
print("hello world".capitalize()) # Hello world
# expandtabs() — replace tabs with spaces
print("H\te\tl\tl\to".expandtabs(4)) # H e l l o
String Formatting
Python offers three ways to embed values in strings. Modern code should prefer f-strings.
%-Formatting (Old Style)
name = "Alice"
age = 30
greeting = "Hello, %s! You are %d years old." % (name, age)
print(greeting) # Hello, Alice! You are 30 years old.
# Format specifiers
pi = 3.14159
print("Pi is approximately %.2f" % pi) # Pi is approximately 3.14
print("Pi is approximately %.4f" % pi) # Pi is approximately 3.1416
# Padding and alignment
print("%20s" % "right") # right
print("%-20s" % "left") # left
print("%05d" % 42) # 00042
str.format()
# Basic usage
name = "Alice"
age = 30
print("Hello, {}! You are {} years old.".format(name, age))
# Positional arguments
print("{0} is {1}, and {0} is a name.".format("Alice", "Python"))
# Keyword arguments
print("{name} is {lang}".format(name="Alice", lang="Python"))
# Format specifications
pi = 3.14159
print("{:.2f}".format(pi)) # 3.14
print("{:>10}".format("right")) # right
print("{:<10}".format("left")) # left
print("{:^10}".format("center")) # center
print("{:0>5}".format(42)) # 00042
# Nested access
person = {"name": "Alice", "age": 30}
print("{p[name]} is {p[age]} years old.".format(p=person))
f-strings (Preferred)
name = "Alice"
age = 30
# Basic f-string
print(f"Hello, {name}! You are {age} years old.")
# Expressions inside f-strings
print(f"Next year you'll be {age + 1}.")
print(f"{'Adult' if age >= 18 else 'Minor'}")
print(f"{name.upper()}")
print(f"{2 ** 10}") # 1024
# Format specifications
pi = 3.14159
print(f"Pi to 2 decimals: {pi:.2f}") # 3.14
print(f"Pi to 4 decimals: {pi:.4f}") # 3.1416
print(f"Right-aligned: {name:>15}") # Alice
print(f"Left-aligned: {name:<15}") # Alice
print(f"Centered: {name:^15}") # Alice
print(f"Zero-padded: {42:05d}") # 00042
# Percentage
ratio = 0.856
print(f"Score: {ratio:.1%}") # Score: 85.6%
# Comma separator for large numbers
print(f"{1000000:,}") # 1,000,000
print(f"{1000000:,.2f}") # 1,000,000.00
# Debugging with = (Python 3.8+)
x = 42
print(f"{x = }") # x = 42
print(f"{x + 10 = }") # x + 10 = 52
# Multiline f-strings
name = "Alice"
age = 30
info = (
f"Name: {name}\n"
f"Age: {age}\n"
f"Adult: {age >= 18}"
)
print(info)
Format Specification Mini-Language
The full format spec follows: [[fill]align][sign][#][0][width][grouping][.precision][type]
# Align: < (left), > (right), ^ (center), = (pad after sign)
print(f"{'hello':>20}") # hello
print(f"{'hello':*^20}") # ******hello*******
print(f"{42:0=10}") # 0000000042
# Sign: + (always), - (only negative), space (space for positive)
print(f"{42:+d}") # +42
print(f"{-42:+d}") # -42
print(f"{42: d}") # 42
# Type specifiers
print(f"{42:b}") # 101010 (binary)
print(f"{42:o}") # 52 (octal)
print(f"{42:x}") # 2a (hex lowercase)
print(f"{42:X}") # 2A (hex uppercase)
print(f"{42:#b}") # 0b101010
print(f"{255:#x}") # 0xff
String Concatenation
The + Operator
first = "Hello"
second = "World"
result = first + " " + second
print(result) # Hello World
The join() Method
# Efficient with join
words = ["Python", "is", "fun", "and", "powerful"]
sentence = " ".join(words)
print(sentence) # Python is fun and powerful
# join works with any separator
csv = ", ".join(["apple", "banana", "cherry"])
print(csv) # apple, banana, cherry
Why + Is Inefficient in Loops
Each += operation creates a new string and copies all existing characters:
Iteration 1: "a" -> 1 char copied
Iteration 2: "ab" -> 2 chars copied
Iteration 3: "abc" -> 3 chars copied
...
Total copies: 1 + 2 + 3 + ... + n = O(n²)
With join(), the final string is built in a single allocation:
Total copies: O(n) — each character copied once
Unicode and Strings
Python 3 Strings Are Unicode
# Unicode strings work naturally
chinese = "你好世界"
arabic = "مرحبا بالعالم"
emoji = "🐍 Python 🚀"
print(chinese) # 你好世界
print(arabic) # مرحبا بالعالم
print(emoji) # 🐍 Python 🚀
Common Encodings
| Encoding | Description | Use Case |
|---|---|---|
| UTF-8 | Variable-width Unicode | Web, files, databases |
| ASCII | 7-bit English only | Legacy systems |
| Latin-1 | 8-bit Western European | Legacy text |
| UTF-16 | 16-bit Unicode | Windows internal |
| UTF-32 | 32-bit Unicode | Fixed-width processing |
# Encoding and decoding
text = "Café naïve résumé"
# UTF-8 (default)
utf8 = text.encode("utf-8")
print(utf8) # b'Caf\xc3\xa9 na\xc3\xafve r\xc3\xa9sum\xc3\xa9'
# Check byte representation
print(len(text)) # 15 (characters)
print(len(utf8)) # 19 (bytes — accented chars use 2 bytes)
Handling Encoding Errors
# Replace — replaces unknown chars with ?
print("café".encode("ascii", errors="replace")) # b'caf?'
# Ignore — drops unknown chars
print("café".encode("ascii", errors="ignore")) # b'caf'
# xmlcharrefreplace — uses XML entity references
print("café".encode("ascii", errors="xmlcharrefreplace"))
# b'café'
# backslashreplace — uses Python escape
print("café".encode("ascii", errors="backslashreplace"))
# b'caf\\xe9'
Common Mistakes
Mistake 1: Forgetting Strings Are Immutable
s = "hello"
# s[0] = "H" # TypeError
# Instead create a new string
s = "H" + s[1:] # "Hello"
Mistake 2: Using is to Compare Strings
s1 = "hello"
s2 = "hello"
if s1 == s2: # Correct
print("Same value")
Mistake 3: Confusing find() and index()
text = "Hello, World!"
pos = text.find("Python") # -1 (no error)
# pos = text.index("Python") # ValueError!
Mistake 4: Not Using Raw Strings for Regex
import re
# re.search("\bhello\b", "hello world") # \b means backspace!
re.search(r"\bhello\b", "hello world") # Correct
Mistake 5: Joining in a Loop Instead of Building a List
# Wrong — O(n²) performance
result = ""
for item in large_list:
result += str(item) + ", "
# Right — O(n) performance
result = ", ".join(str(item) for item in large_list)
Mistake 6: Forgetting split() Without Args Splits on Any Whitespace
text = "Hello World\t\tFoo"
print(text.split()) # ['Hello', 'World', 'Foo'] — any whitespace
print(text.split(" ")) # ['Hello', '', '', 'World\t\tFoo'] — only spaces
Practice Exercises
Exercise 1: Reverse a String
def reverse_string(s):
result = ""
for char in s:
result = char + result
return result
def reverse_string_v2(s):
if len(s) <= 1:
return s
return reverse_string_v2(s[1:]) + s[0]
print(reverse_string("Hello")) # olleH
print(reverse_string_v2("Python")) # nohtyP
Exercise 2: Count Vowels
def count_vowels(s):
count = 0
for char in s.lower():
if char in "aeiou":
count += 1
return count
# One-liner version
def count_vowels_v2(s):
return sum(1 for c in s.lower() if c in "aeiou")
print(count_vowels("Hello World")) # 3
print(count_vowels("Python")) # 1
print(count_vowels("AEIOU")) # 5
Exercise 3: Palindrome Checker
def is_palindrome(s):
cleaned = "".join(c.lower() for c in s if c.isalnum())
return cleaned == cleaned[::-1]
print(is_palindrome("racecar")) # True
print(is_palindrome("A man a plan a canal Panama")) # True
print(is_palindrome("hello")) # False
Exercise 4: Caesar Cipher
def caesar_cipher(text, shift):
result = []
for char in text:
if char.isalpha():
base = ord('A') if char.isupper() else ord('a')
shifted = (ord(char) - base + shift) % 26 + base
result.append(chr(shifted))
else:
result.append(char)
return "".join(result)
print(caesar_cipher("Hello, World!", 3)) # Khoor, Zruog!
print(caesar_cipher("Khoor, Zruog!", -3)) # Hello, World!
Exercise 5: Word Frequency Counter
def word_frequency(text):
words = text.lower().split()
freq = {}
for word in words:
word = word.strip(".,!?;:\"'")
freq[word] = freq.get(word, 0) + 1
return freq
text = "the cat sat on the mat the cat"
freq = word_frequency(text)
print(freq) # {'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1}
# Sort by frequency
sorted_freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
print(sorted_freq) # [('the', 3), ('cat', 2), ('sat', 1), ...]
Exercise 6: String Compression
def compress(s):
if not s:
return ""
result = []
count = 1
for i in range(1, len(s)):
if s[i] == s[i - 1]:
count += 1
else:
result.append(s[i - 1] + str(count) if count > 1 else s[i - 1])
count = 1
result.append(s[-1] + str(count) if count > 1 else s[-1])
return "".join(result)
print(compress("aabcccccaaa")) # a2b1c5a3
print(compress("abcdef")) # abcdef (no compression needed)
print(compress("aaabbb")) # a3b3
Exercise 7: Validate Email (Simplified)
def is_valid_email(email):
if "@" not in email:
return False
local, domain = email.rsplit("@", 1)
if not local or not domain:
return False
if "." not in domain:
return False
if not all(c.isalnum() or c in "._-+" for c in local):
return False
return True
print(is_valid_email("user@example.com")) # True
print(is_valid_email("user.name+tag@co.org")) # True
print(is_valid_email("invalid@")) # False
print(is_valid_email("no-at-sign.com")) # False
String Performance Tips
| Operation | Complexity | Recommended Alternative |
|---|---|---|
s += "x" in loop | O(n²) | Use list.append() + join() |
s.split() no args | O(n) | Best for whitespace splitting |
s.replace() all | O(n) | Use re.sub() for complex patterns |
"x" in s substring | O(n*m) | Use s.find() if index needed |
s.count() | O(n) | Use collections.Counter for multiple counts |
# Performance comparison
import time
# Slow: concatenation in loop
start = time.time()
result = ""
for i in range(10000):
result += str(i)
print(f"Concatenation: {time.time() - start:.4f}s")
# Fast: join with generator
start = time.time()
result = "".join(str(i) for i in range(10000))
print(f"Join: {time.time() - start:.4f}s")
Key Takeaways
| Concept | Summary |
|---|---|
| Strings are immutable | You cannot modify them in place — operations return new strings |
| Single and double quotes | Functionally identical — choose for readability |
| Triple quotes | For multiline strings and docstrings |
| Escape sequences | \n, \t, \\ insert special characters |
| Raw strings | r"..." treats backslashes as literal — essential for regex |
| Indexing | s[i] — positive (left to right) and negative (right to left) |
| Slicing | s[start:stop:step] — stop is exclusive |
str() constructor | Converts any object to its string representation |
| Case methods | upper(), lower(), title(), capitalize(), casefold() |
| Search methods | find(), index(), count(), startswith(), endswith() |
| Split/Join | split() breaks strings, join() combines them |
| Test methods | isalpha(), isdigit(), isalnum(), isspace() |
| f-strings | f"Hello, {name}" — preferred for string formatting |
| join() for concatenation | Use " ".join(list) instead of + in loops |
| Encoding | encode() converts to bytes, decode() converts back |
| maketrans/translate | Character-level translation for efficient replacements |
In the next tutorial, we'll explore Python Lists — ordered, mutable collections that work hand-in-hand with strings for powerful data processing.