: behavior of data types and operations for mutable and immutable objects in python language
The examples and explanations in this post have the following specifications:
environment: vagrant virtual machine with linux 14.04.5 LTS for Ubuntu
language: Python 3.6.1 (default, Nov 17 2016, 01:08:31)
compiler: gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Python docs at docs.python.org describes:
all data in a python program is represented by objects or by relations between objects.
docs.python.org, further explains this concept in reference to PyObject
and PyVarObject
:
PyObject
- All object types are extensions of this type. This is a type which contains the information Python needs to treat a pointer to an object as an object. In a normal “release” build, it contains only the object’s reference count and a pointer to the corresponding type object. It corresponds to the fields defined by the expansion of the
PyObject_HEAD
macro.
PyVarObject
- This is an extension of
PyObject
that adds theob_size
field. This is only used for objects that have some notion of length. This type does not often appear in the Python/C API. It corresponds to the fields defined by the expansion of thePyObject_VAR_HEAD
macro.
The source code for the current main release of Python 3.7.0, which is built in C Language and hosted on the Python Github repository, has the below code in the object.h
header file, which defines the typedef
for a PyObject
:
/* Nothing is actually declared to be a PyObject, but every pointer to * a Python object can be cast to a PyObject*. This is inheritance built * by hand. Similarly every pointer to a variable-size Python object can, * in addition, be cast to PyVarObject*. */ typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; typedef struct { PyObject ob_base; Py_ssize_t ob_size; /* Number of items in variable part */ } PyVarObject;
Notice, that every object includes a pointer to the struct _typeobject
, this struct contains all the attributes, methods, and conditions for behavior for each object type. The _typeobject
struct contains 100 lines of code with approximately 60 different variables in C language. It has an immense amount of power and information pertaining to all the code a programmer builds with Python Language. The _typeobject
struct can be viewed in the above referenced object.h
source code, or in the Python documentation at docs.python.org.
This blog post discusses some of the attributes and behaviors for object types in python language, and attempts to elaborate on and differentiate between the terms and concepts: object vs. type vs. class vs. id vs. instance vs. value vs. variable name.
: types, classes, instances
In python data types and classes are very similar. Python docs at docs.python.org describes:
An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The
type()
function returns an object’s type (which is an object itself). Like its identity, an object’s type is also unchangeable.
Python has various built-in types that a programmer may be familiar with that behave similar to other types in other languages, such as: strings, integers, floats, arrays (lists), dictionaries, booleans, and more. Since Python 2.2, python has implemented "type/class unification", which essentially means that there is no difference between user-defined classes and built in types. This means that python language allows for users to have the ability to create custom types, called classes, that have their own set of properties, behaviors, and methods.
Variables are the names given to object instances in python. Python uses a concept called aliasing, which means that there may be multiple variables in a single program, while having different names, yet the same value, that refer to the same immutable object in the same place in memory, with the same ID. That will be further discussed in the immutable vs. mutable sections. For this introduction, a variable is simply the name given to a value, which is an instance of a type or class.
An instance is an object of a class or type. Instances are the actual content of your code that makes it unique from python standard libraries and source code. Anytime a programmer uses python to build programs, they create variables of certain values, and all those implementations are different instances. For example, in the python code one = 1
, the variable one
has the value 1, which is an instance of an integer object of the type / class int
.
To help explain these major concepts, let's look at a python executable file that I've created. The below file simply defines a new class, then initializes an array, called p_types
, of various types in python, which includes functions, tuples, integers, sets, etc. Then the for
loop, loops through the p_types
list printing the value, type and id for each case. One for
loop prints the type()
and the other loop prints the id()
of each element of the p_types
list.
$ cat type-id-value.py #!/usr/bin/python3 class bears: def __init__(self): pass bear = bears() p_types = [1, 1.0, [1], {1}, {1 : 1}, '1', (1, ), bear, type, print] print('.............. TYPES ...............') for x in range(10): print('VALUE: {} TYPE: {}'.format(p_types[x], type(p_types[x]))) print("\n.............. ID's ...............") for x in range(10): print('VALUE: {} ID: {}'.format(p_types[x], id(p_types[x])))
Next, lets run the executable file and see the output.
$ ./type-id-value.py | tr '<>' '|' .............. TYPES ............... VALUE: 1 TYPE: |class 'int'| VALUE: 1.0 TYPE: |class 'float'| VALUE: [1] TYPE: |class 'list'| VALUE: {1} TYPE: |class 'set'| VALUE: {1: 1} TYPE: |class 'dict'| VALUE: 1 TYPE: |class 'str'| VALUE: (1,) TYPE: |class 'tuple'| VALUE: |__main__.bears object at 0x7fb4769a6c50| TYPE: |class '__main__.bears'| VALUE: |class 'type'| TYPE: |class 'type'| VALUE: |built-in function print| TYPE: |class 'builtin_function_or_method'| .............. ID's ............... VALUE: 1 ID: 10055552 VALUE: 1.0 ID: 140413061927176 VALUE: [1] ID: 140413060716552 VALUE: {1} ID: 140413061287496 VALUE: {1: 1} ID: 140413061486536 VALUE: 1 ID: 140413061331464 VALUE: (1,) ID: 140413061331712 VALUE: |__main__.bears object at 0x7fb4769a6c50| ID: 140413060672592 VALUE: |class 'type'| ID: 9897760 VALUE: |built-in function print| ID: 140413080689928
Notice that when the function type()
is called, it returns the class or type of the argument. When the function id()
is called, it returns the id, which is a unique identifier number 0f the object. The above code also prints the value of each object simply for reference purposes. From the type()
function call, the custom class bears
and the built-in types all return class
with the class type associated with it, and the id's are all unique. Since, each ID is unique, this means that each object is it's own unique instance and stored in it's own place in memory. In fact, the ID is a reference to the memory location. docs.python.org has the below description about what the ID of python objects refers to:
id
(object)Return the "identity" of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same
id()
value.CPython implementation detail: This is the address of the object in memory.
Since Python is built in C language, and C language stores integers in 4 memory locations, of 4 bytes each, which is 8 bits, that is 32 different bits. For more on how C language allocates memory, check out my blog post: what not to do is as important as what to do. The below code snippet is an example of how python ID's are memory addresses.
$ cat idmemaddress.py #!/usr/bin/python3 print(id(0), '--', id(1), '--', id(2), '--', id(3), '--', id(4)) print(id(1), '-', id(0), '=', id(1) - id(0)) $ ./idmemaddress.py 10055520 -- 10055552 -- 10055584 -- 10055616 -- 10055648 10055552 - 10055520 = 32
Notice how each next integer has an ID that is a difference of 32 from the previous integer, which is equal to the 4 bytes of memory or 32 bits of memory that is used to store integers in C language.
: mutable vs. immutable objects
immutable objects
Some immutable types: boolean, int, float, long, complex, tuple, str, bytes, frozen set
Immutable in python means that once python initializes an instance of an object of an immutable type and allocates memory for that object, the object remains unchanged throughout the duration of the python application.
preallocation
Since, many objects are very commonly used in Python programs, and python spends much time allocating and deallocating integers, python preallocates a linked list of integer type objects upon initiation of a python program. The below code snippet demonstrates these concepts of immutable objects remaining unchanged throughout the program and python's preallocation upon initiation of a python application.
$ cat immutable.py #!/usr/bin/python3 a = True; b = 256; c = 257; d = b print('a=', id(a), 'b=', id(b), 'c=', id(c), 'd=', id(d)) b = 258; print('a=', id(a), 'b=', id(b), 'c=', id(c), 'd=', id(d)) $ ./immutable.py a= 9899456 b= 10063712 c= 139631854534576 d= 10063712 a= 9899456 b= 139631854534448 c= 139631854534576 d= 10063712 $ ./immutable.py a= 9899456 b= 10063712 c= 140259535511472 d= 10063712 a= 9899456 b= 140259535511344 c= 140259535511472 d= 10063712
Notice that in the above code snippet, the integer 256 always has the same ID: 10063712 and the boolean True always has the id: 9899456, even in both instances of the python application. This is because it is preallocated, and always the same memory address. For the integer 257, since the memory is not preallocated, the ID is different in both instances of the above python application; however the ID does remain the same in each instance because 257 is an integer and immutable. The above example also has an example of reassignment when 'b' is reassigned the value of 258. In the second print statement of each program, 'b' has a new ID number, and that is the ID for the integer 258. Since 258 is out of the preallocation range, it is different in both instances of the program.
:NSMALLPOSINTS
and NSMALLNEGINTS
The python source code in the longobject.c
file of the Cpython directory of the python Github repository, linked here, shows how this preallocation process functions.
#include “clinic/longobject.c.h” /*[clinic input] class int “PyObject *” “&PyLong_Type” [clinic start generated code]*/ /*[clinic end generated code: output=da39a3ee5e6b4b0d input=ec0275e3422a36e3]*/ #ifndef NSMALLPOSINTS #define NSMALLPOSINTS 257 #endif #ifndef NSMALLNEGINTS #define NSMALLNEGINTS 5 #endif ... #if NSMALLNEGINTS + NSMALLPOSINTS > 0 /* Small integers are preallocated in this array so that they can be shared. The integers that are preallocated are those in the range -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive). */ static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS]; #ifdef COUNT_ALLOCSPy_ssize_t quick_int_allocs, quick_neg_int_allocs; #endif
The above source code for python shows how python preallocates these integers into a PyLongObject
type static array.
alias
Notice that the variable 'd' was storing the value of 'b', which was the integer 256. In this instance, 'd' is considered an alias of the object stored in the same location of 'b'. The ID of 'd' remained the same, even when 'b' was reassigned to 258, because the value of 'd' was that of an immutable object 256, even though it was an alias.
mutable objects
Some mutable types: list, set, dict, byte array
Mutable in python means that once python initializes an instance of an object of an immutable type and allocates memory for that object, the object may change and have different values, yet still be the same instance with the same ID, the same object and same place in memory throughout the python application. The below code snippet uses lists to demonstrates these concepts of instances of mutable objects changing yet keeping the same ID throughout the program.
$ cat immutable.py #!/usr/bin/python3 a = [10, 20, 30] b = [2, 5, 6] c = [10, 20, 30] d = a print('a=', id(a), 'b=', id(b), 'c=', id(c), 'd=', id(d)) a.append(b) a += [9] print('a=', id(a), 'b=', id(b), 'c=', id(c), 'd=', id(d)) a = [9] print('a=', id(a), 'b=', id(b), 'c=', id(c), 'd=', id(d)) print(a, b, c, d) $ ./immutable.py a= 140118659929736 b= 140118659929800 c= 140118659929928 d= 140118659929736 a= 140118659929736 b= 140118659929800 c= 140118659929928 d= 140118659929736 a= 140118659929864 b= 140118659929800 c= 140118659929928 d= 140118659929736 [9] -- [2, 5, 6] -- [10, 20, 30] -- [10, 20, 30, [2, 5, 6], 9]
Notice that in the above program, all the ID numbers remain the same for each mutation of the variable 'a'. Unlike integers, the lists 'a' and 'b' have different ID numbers although they have the same value. Since, lists are mutable there are new objects created to account for mutations. Variable 'd' does share the same ID as 'a', however, since 'd' is assigned to the object 'a'. However, when 'a' is reassigned to a new list, variable 'd' still contains the value of the object that 'a' was originally assigned to. The line a = [9]
is simply a reassignment, not a mutation. The other lines of code: a.append(b)
and a += [9]
are methods to mutate 'a'. Since, 'd' was originally assigned to the object of 'a', the ID of 'd' remains unchanged and its value is observed include all the mutations that occurred to 'a' before 'a' was reassigned.
: why it matters
A classic example of why understanding object mutability behaviors matters is with a program that generates massive strings by appending other strings to it. In python, since strings are immutable, you cannot change them, and instead you must reassign a string type variable to have a new value and thus allocate new memories and new ID's for the new variable. However, python allocates for memory differently with lists, and so generating an expansive list of strings, and then calling the join()
function on that list, thus converting the list to a string, is slightly faster. Essentially, with the list, there are not new strings being created for every element that is joined onto the end string, which is what happens with string reassignment. Below is the code for such a scenario, in which I use the time library to demonstrate runtime of each function. You will notice that the runtime for the list_main()
function, which generates the list is slightly faster.
$ cat run-time.py #!/usr/bin/python3 import time import sys def string_main(): string1 = '' string2 = '' for i in range(0, 483647): string1 += str(i) for i in range(0, 5): string2 += str(i) print('time to make something like this: ', string2) def list_main(): string1 = ''.join([str(i) for i in range(0, 483647)]) string2 = ''.join([str(i) for i in range(0, 5)]) print('time to make something like this: ', string2) start_time1 = time.time() string_main() end_time1 = time.time() print("Time to run string() =", (end_time1 - start_time1)) start_time2 = time.time() list_main() end_time2 = time.time() print("Time to run list() =", (end_time2 - start_time2)) $ ./run-time.py time to make something like this: 01234 Time to run string() = 0.10157418251037598 time to make something like this: 01234 Time to run list() = 0.09430074691772461
: scope of mutable and immutable objects
Another example of why knowing mutable vs. immutable objects matters is as it pertains to the scope of variables as they are accessed by different functions throughout an application. Similar to in C language, if a function modifies an input variable, but does not modify it's value of where it is stored in memory, it is modifying a new local variable of the same value. To explain this concept, I've created another mini example process:
$ cat scope-of-variables.py #!/usr/bin/python3 def example(a_list, a_string): a_list.append([1, 2, 3]) a_string = "this is a new " + a_string print("inside example function after modifications:\n", a_list, a_string) def main(): my_list = [4, 5, 6] my_string = "my string" print("initialized 2 instances of list and string objects:\n", my_list, my_string) test(my_list, my_string) print("final state of both objects after function:\n", my_list, my_string) main() $ ./scope-of-variables.py initialized 2 instances of list and string objects: [4, 5, 6] my string inside example function after modifications: [4, 5, 6, [1, 2, 3]] this is a new my string final state of both objects after function: [4, 5, 6, [1, 2, 3]] my string
Notice that in the above program, an immutable string, and mutable list are declared, and therefore, 2 instances of objects with 2 different ID's and places in memory are initialized. Both of these variables are called as inputs with the example()
function. There, the variables are modified, with nothing returned from example()
. The print()
statements in the code verify that the list that was modified inside example()
function was recognized inside of main()
function because the example()
function modified the list's value in it's memory address. However, the reassignment of the string was not preserved outside the scope of the example()
function, because main()
has access to the original memory of the string, yet the reassignment of the sting in the example() function created a new address in memory that main()
did not have access to.