Data structures

Python provides four core sequence and collection types: tuples, lists, dictionaries, and sets. Choosing the right type affects both correctness and performance.

Tuples and lists both hold ordered sequences of any type. The key distinction is mutability:

Tuples are immutable. Once created, their contents cannot change. Apply tuples for values that should not be modified — function return values, dictionary keys, or fixed configuration records.
Lists are mutable. Apply lists when a collection’s contents need to change over time — adding items, removing items, or sorting.

Tuples

Creating a tuple

Apply parentheses or a trailing comma to create a tuple. A trailing comma after a single value is required to distinguish a tuple from a grouped expression:

empty_tuple = ()

# A trailing comma creates a single-item tuple:
one_stooge = 'Larry',
# ('Larry',)

# Multiple items — no trailing comma needed on the last item:
all_stooges = 'Larry', 'Curly', 'Moe'
# ('Larry', 'Curly', 'Moe')

# Parentheses make multi-item tuples easier to read:
parens_stooges = ('Larry', 'Curly', 'Moe')

Tuple actions

Tuple unpacking assigns each element to a named variable in one statement:

parens_stooges = ('Larry', 'Curly', 'Moe')
a, b, c = parens_stooges
# a = 'Larry', b = 'Curly', c = 'Moe'

# Swap two variables without a temporary variable:
one = 'one'
two = 'two'
one, two = two, one
# one = 'two', two = 'one'

# Build a tuple from a list:
stooge_list = ['Larry', 'Curly', 'Moe']
tuple(stooge_list)
# ('Larry', 'Curly', 'Moe')

# Combine tuples with +:
('Larry',) + ('Curly', 'Moe')
# ('Larry', 'Curly', 'Moe')

# Repeat items:
('howdy',) * 5
# ('howdy', 'howdy', 'howdy', 'howdy', 'howdy')

# Compare tuples — comparison proceeds element-by-element from left to right:
x = (1, 2, 3)
y = (2, 3, 4)
x == y   # False
x < y    # True
x <= y   # True

# Concatenating tuples always creates a new object:
first  = ('one', 'two', 'three')
second = ('four',)
third  = first + second
id(first) == id(third)   # False — different objects

Extended unpacking

The * operator in an unpacking assignment captures remaining elements into a list. This pattern is useful when parsing structured records where the first few fields are fixed but the rest vary:

# Parse a log line: "2024-03-15 ERROR db.py Connection refused"
parts = ("2024-03-15", "ERROR", "db.py", "Connection", "refused")
date, level, *message_parts = parts
message = " ".join(message_parts)
# date = "2024-03-15", level = "ERROR", message = "Connection refused"

# Take the first and last, capture everything between:
first, *middle, last = [1, 2, 3, 4, 5]
# first = 1, middle = [2, 3, 4], last = 5

Iterate through a tuple

Apply for/in to iterate through a tuple:

nums = ('one', 'two', 'three', 'four')
for n in nums:
    print(n)
# one
# two
# three
# four

Lists

Lists hold ordered, mutable collections of any type. Apply them when the contents need to change, when order matters, and when you need index-based access.

Creating a list

The following examples show the most common ways to create a list:

# Empty list with brackets:
empty_list = []

# Empty list with list():
empty_too = list()

# List with initial values:
beatles = ['John', 'Paul', 'George', 'Ringo']

# Create a list from an iterable — splits a string into individual characters:
list('individual')
# ['i', 'n', 'd', 'i', 'v', 'i', 'd', 'u', 'a', 'l']

# Create a list from a tuple:
tup = ('one', 'two', 'three')
list(tup)
# ['one', 'two', 'three']

# Create a list by splitting a delimited string:
adj = 'once-in-a-lifetime'
adj.split('-')
# ['once', 'in', 'a', 'lifetime']

Getting list items

Retrieve items by offset or slice. Offsets are zero-based. Negative offsets count from the end:

turtles = ['Leonardo', 'Donatello', 'Michaelangelo', 'Raphael']

# By offset:
turtles[2]    # 'Michaelangelo'
turtles[-1]   # 'Raphael' — the last item

# By slice:
turtles[0:2]  # ['Leonardo', 'Donatello']

List functions

The following examples demonstrate the most common list operations:

nums = ['one', 'two', 'three', 'four']

nums.append('five')              # add to the end
# ['one', 'two', 'three', 'four', 'five']

nums.insert(0, 'zero')           # insert at an index
# ['zero', 'one', 'two', 'three', 'four', 'five']

['blah'] * 3                     # repeat
# ['blah', 'blah', 'blah']

numeros = ['uno', 'dos', 'tres']
nums.extend(numeros)             # append all items from another list

nums[2] = 'too'                  # replace by index
nums[6:] = ['six', 'seven', 'eight']  # replace by slice

del nums[0]                      # delete by index
nums.remove('seven')             # delete the first occurrence of a value

nums.pop()                       # remove and return the last item
nums.pop(1)                      # remove and return by index

nums.clear()                     # delete all items

nums.index('two')                # find the index of a value
'three' in nums                  # True if value is present

nums.count('two')                # count occurrences of a value
', '.join(nums)                  # join to a delimited string
len(nums)                        # length

# sorted() returns a new sorted list — the original is unchanged:
alpha_nums = sorted(nums)

# .sort() sorts the list in place — the original is modified:
nums.sort()

Copying lists

Assigning a list to a second variable does not create a copy — both names refer to the same object. Apply .copy(), list(), or a full slice to create an independent copy:

a = ['one', 'two', 'three']
b = a          # b and a point to the same list — modifying b changes a
c = a.copy()   # independent copy
d = list(a)    # also an independent copy
e = a[:]       # also an independent copy

id(a) == id(b)   # True  — same object
id(a) == id(c)   # False — different objects

a == c           # True  — equal contents

Iterating through lists

Apply for/in to iterate. Apply zip() to iterate two lists simultaneously — it pairs elements by position and stops at the shorter list:

ls = ['one', 'two', 'three', 'four', 'five']
for i in ls:
    print(i)
# one two three four five

es = ['uno', 'dos', 'tres', 'quatro', 'cinco']
for english, spanish in zip(ls, es):
    print(english, ":\t", spanish)
# one :    uno
# two :    dos
# three :  tres
# four :   quatro
# five :   cinco

Sorting with a key function

Pass a key= function to sorted() or .sort() to control sort order. The key function receives each element and returns the value to sort by. The following example sorts deployment records by timestamp:

deployments = [
    {"service": "api",      "deployed_at": "2024-03-15T14:30:00"},
    {"service": "worker",   "deployed_at": "2024-03-15T09:15:00"},
    {"service": "frontend", "deployed_at": "2024-03-15T16:45:00"},
]

# Sort by the deployed_at field — the key function extracts the sort value:
by_time = sorted(deployments, key=lambda d: d["deployed_at"])
# worker (09:15), api (14:30), frontend (16:45)

# Sort in reverse — most recent first:
most_recent = sorted(deployments, key=lambda d: d["deployed_at"], reverse=True)

List comprehensions

A list comprehension builds a list from an iterable in a single expression. The format is:

[expression for item in iterable]

This is equivalent to a for loop that appends to a list, but is more concise:

ls = ['one', 'two', 'three']

# Capitalize each item:
upper = [item.capitalize() for item in ls]
# ['One', 'Two', 'Three']

# Generate a range of numbers:
num_ls = [n for n in range(0, 10)]
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Apply an expression to each item:
index_adjusted = [n + 1 for n in num_ls]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Add a condition after the iterable to filter which items are included:

[expression for item in iterable if condition]

animals = ['cat', 'dog', 'mouse', 'rat', 'bird']

three    = [a for a in animals if len(a) == 3]
# ['cat', 'dog', 'rat']

long_cap = [a.capitalize() for a in animals if len(a) != 3]
# ['Mouse', 'Bird']

Nest multiple iterables by listing them one after the other — Python evaluates them from left to right, like nested for loops:

nums  = range(1, 4)
alpha = ['x', 'y']

# Equivalent to two nested for loops:
ls_comp = [(num, a) for num in nums for a in alpha]
# [(1, 'x'), (1, 'y'), (2, 'x'), (2, 'y'), (3, 'x'), (3, 'y')]

# Unpack tuples directly in the for clause:
for (num, a) in ls_comp:
    print(num, a)

Generator expressions

A generator expression has the same syntax as a list comprehension but with parentheses instead of brackets. It produces values one at a time rather than building the entire list in memory. Apply generator expressions when you only need to iterate over results once — for example, passing them directly to sum(), max(), or any():

from pathlib import Path

# A list comprehension builds the full list before sum() runs:
total = sum([f.stat().st_size for f in Path("/var/log").glob("*.log")])

# A generator expression yields one size at a time — no list is built:
total = sum(f.stat().st_size for f in Path("/var/log").glob("*.log"))

For thousands of log files, the generator expression uses a fixed amount of memory regardless of how many files exist. The list comprehension allocates memory proportional to the number of files.

Lists of lists

A list can contain other lists. Access nested items with chained bracket notation:

evens = [2, 4, 6, 8, 10]
odds  = [1, 3, 5, 7, 9]
prime = [1, 7, 13, 19, 23]
nums  = [evens, odds, prime]

nums[0]      # [2, 4, 6, 8, 10]
nums[0][2]   # 6