Variables & Data Types
Understand Dynamic Typing, Memory Allocation, and Reference Counting under the hood of CPython.
In languages like C++ or Java, a variable is like an empty physical bucket. If you declare an integer bucket, it physically takes up 4 bytes of memory, and you pour a number into it. You can never pour a string into an integer bucket.
In Python, variables are not buckets—they are nametags (pointers) tied to
objects in memory with a string. The data itself floats independently in the computer's
memory. When you write x = 10, Python builds the integer `10` heavily wrapped
in a C-structure, and then ties a nametag called `x` to it.
Imagine a massive warehouse (RAM). You build a wooden chair (the Integer object `10`) and place it in the warehouse. Then, you take a sticky note, write "x" on it, and slap it on the chair.
Later, you type x = "Hello". In Python, this does NOT mean you painted the
wooden chair into a string. It means you built a brand new television (the String object
"Hello") on the other side of the warehouse, ripped the "x" sticky note off the chair, and
slapped it onto the television. The wooden chair is now abandoned.
# 1. Dynamic Binding
x = 42
print(type(x)) #
# 2. Rebinding (Changing the pointer)
x = "AI Engineer"
print(type(x)) #
# 3. Multiple Pointers to the same Object
a = [1, 2, 3]
b = a # b points to the EXACT same list
b.append(4)
print(a) # Outputs: [1, 2, 3, 4]
| Code Line | Explanation |
|---|---|
x = 42 |
Python calls malloc() to claim memory, builds a
PyLongObject containing the number 42, sets its Reference Count to 1,
and binds the namespace pointer x to that memory address. |
x = "AI Engineer" |
Python allocates a brand new PyUnicodeObject. It moves the
x pointer to this new object. The Reference Count for the original `42`
drops to 0, triggering the Garbage Collector to destroy `42` and free the RAM. |
b = a |
Python does NOT copy the list. It simply creates a second sticky note `b` and slaps
it onto the exact same PyListObject in memory. The list's Reference
Count becomes 2. |
b.append(4) |
The append() method mutates the underlying memory buffer of the single
list object. Because a and b are looking at the exact same
physical warehouse item, printing a reveals the mutation. |
Input: age = 25; age_str = str(age)
Transformation: The str() casting function does not modify the
integer `25`. It reads the bytes of the integer, executes string encoding logic, allocates a
totally isolated memory block for a string object `"25"`, and returns a pointer to it.
Output State: You now possess two independent objects in RAM. Modifying one will never affect the other.
In standard C, an integer `int x = 5;` is literally just 4 bytes of raw binary
(00000000 00000000 00000000 00000101).
In CPython, an integer is a massive C-struct called PyObject. Every single
variable you create carries a massive overhead consisting of:
- ob_refcnt: (8 bytes) Tracks how many nametags point to this object.
- ob_type: (8 bytes) A pointer to the class defining what this object is (e.g., Integer definitions).
- ob_size: (8 bytes) For variable length items.
- ob_digit: (4+ bytes) The actual raw binary data.
This is why Python is slower and consumes 3x more RAM than C++—every simple number is actually a complex, heavyweight software object.
Memory layout of x = 3.14:
Namespace Dictionary:
{"x": 0x1A4B2F890} ------> [ PyFloatObject at 0x1A4B2F890 ]
| ob_refcnt = 1 |
| ob_type = float |
| ob_fval = 3.14 |
Scalar primitives (int, float, bool) are
0-Dimensional. They hold single values.
Sequence types (str) are 1-Dimensional arrays of characters under the hood.
id(variable): Returns a base-10 integer representing the literal C memory address of the object (e.g., `140728994554632`).
type(variable): Returns a <class 'type'> object
indicating the structure map.
Integer Caching (-5 to 256):
If you type a = 100 and b = 100, you would expect Python to build
two separate Integer objects. However, doing a is b returns
True!
Why? To save memory, when Python boots up, it permanently pre-allocates an array of integers
from -5 to 256. If you assign a variable to any number in this range, Python intercepts it
and simply hands you a pointer to the globally cached object. If you do
a = 300; b = 300, `a is b` will return False because 300
exceeds the cache array bounds, forcing Python to allocate two independent memory blocks.
Type Hinting (PEP 484): Python 3 introduced syntax like
age: int = 25. It's critical to know that this changes absolutely nothing about
Python's dynamic memory allocation. The : int is purely a visual suggestion for
external Linters (like MyPy) to read. You can still maliciously type `age = "hello"` on the
next line and Python will execute it perfectly without a runtime error.
Mistake: Shadowing built-in functions.
list = [1, 2, 3]
Why is this disastrous?: Python possesses a global pointer named `list` that
points to the internal C-code for building lists. When you execute this, you overwrite that
nametag to point to your `[1,2,3]` array instead. If you try to cast a tuple later using
list((4,5)), your program crashes with
TypeError: 'list' object is not callable because you destroyed the global
factory pointer.
Because variables are just pointers, passing a 1-Gigabyte List into a function mapping
def process(data): takes O(1) instantaneous time. The computer does not copy
1GB of data into the function scope; it simply passes a microscopic 8-byte pointer looking
at the original memory block.
Challenge: Write code to prove whether casting an integer
a = 10 to a float b = float(a) modifies the original object or
creates a new one.
Expected Answer: print(id(a) == id(b)). It outputs False,
proving casting always allocates fresh memory architecture.
Garbage Collection (The GIL and Refcount): Why doesn't Python crash your computer's RAM?
Every time a variable goes out of scope (like when a function ends), Python looks at the `ob_refcnt` inside the object's C-struct and decrements it by 1. The microsecond that counter hits exactly `0`, Python's memory manager immediately triggers a C `free()` command, permanently obliterating the object and handing the RAM back to the operating system. If two objects have pointers referencing each other (Cyclic Reference), the counter never hits 0—this is why Python includes a secondary backup system (The Generational Garbage Collector) that periodically sweeps the RAM looking for abandoned loops.