Musings & ramblings of a Pythonista

Understanding Python variables and Memory Management

Have you ever noticed any difference between variables in Python and C? For example, when you do an assignment like the following in C, it actually creates a block of memory space so that it can hold the value for that variable.

int a = 1;

You can think of it as putting the value assigned in a box with the variable name as shown below.

int a =1;

And for all the variables you create a new box is created with the variable name to hold the value. If you change the value of the variable the box will be updated with the new value. That means doing

a = 2;

will result in

a = 2;

Assigning one variable to another makes a copy of the value and put that value in the new box.

int b = a;


But in Python variables work more like tags unlike the boxes you have seen before. When you do an assignment in Python, it tags the value with the variable name.

a = 1

a = 1

and if you change the value of the varaible, it just changes the tag to the new value in memory. You dont need to do the housekeeping job of freeing the memory here. Python's Automatic Garbage Collection does it for you. When a value is without names/tags it is automatically removed from memory.

a = 2

a = 2

Assigning one variable to another makes a new tag bound to the same value as show below.

b = a

b = a

Other languages have 'variables'. Python has 'names'.

A bit about Python's memory management

As you have seen before, a value will have only one copy in memory and all the variables having this value will refer to this memory location. For example when you have variables a, b, c having a value 10, it doesn't mean that there will be 3 copy of 10s in memory. There will be only one 10 and all the variables a, b, c will point to this value. Once a variable is updated, say you are doing a += 1 a new value 11 will be allocated in memory and a will be pointing to this.

Let's check this behaviour with Python Interpreter. Start the Python Shell and try the following for yourselves.

>>> a = 10
>>> b = 10
>>> c = 10
>>> id(a), id(b), id(c)
(140621897573616, 140621897573616, 140621897573616)
>>> a += 1
>>> id(a)

id() will return an objects memory address (object's identity). As you have noticed, when you assign the same integer value to the variables, we see the same ids. But this assumption does not hold true all the time. See the following for example

>>> x = 500
>>> y = 500
>>> id(x)
>>> id(y)

What happened here? Even after assigning the same integer values to different variable names, we are getting two different ids here. These are actually the effects of CPython optimization we are observing here. CPython implementation keeps an array of integer objects for all integers between -5 and 256. So when we create an integer in that range, they simply back reference to the existing object. You may refer the following links for more information.

Stack Overflow: "is" operator behaves unexpectedly with integers

Let's take a look at strings now.

>>> s1 = 'hello'
>>> s2 = 'hello'
>>> id(s1), id(s2)
(4454725888, 4454725888)
>>> s1 == s2
>>> s1 is s2
>>> s3 = 'hello, world!'
>>> s4 = 'hello, world!'
>>> id(s3), id(s4)
(4454721608, 4454721664)
>>> s3 == s4
>>> s3 is s4

Looks interesting, isn't it? When the string was a simple and shorter one the variable names where referring to the same object in memory. But when they became bigger, this was not the case. This is called interning, and Python does interning (to some extent) of shorter string literals (as in s1 and s2) which are created at compile time. But in general, Python string literals creates a new string object each time (as in s3 and s4). Interning is runtime dependant and is always a trade-off between memory use and the cost of checking if you are creating the same string. There's a built-in intern() function to forcefully apply interning. Read more about interning from the following links.

Stack Overflow: Does Python intern Strings?
Stack Overflow: Python String Interning
Internals of Python String Interning

Now we will try to create custom objects and try to find their identities.

>>> class Foo:
...     pass
>>> bar = Foo()
>>> baz = Foo()
>>> id(bar)
>>> id(baz)

As you can see, the two instances have different identities. That means, there are two different copies of the same object in memory. When you are creating custom objects, they will have unique identities unless you are using Singleton Pattern which overrides this behaviour (in __new__()) by giving out the same instance upon instance creation.

Thanks to Jay Pablo (see comments) for correcting the mistakes and making this post a better one.

Tagged under Python, C, Variables, Memory Management

blog comments powered by Disqus