Passing around Arrays in Python (as compared to C)

by Amit


For those of you with a C/C++ background (at some point of time, like me), passing around Python’s arrays (or lists) to functions can lead to a little bewilderment. For example, consider the following code snippet:

from array import array

def func(arr):
    print('Object Identifier: {0:d}'.format(id(arr)))
    for i in range(0,len(arr)):
        arr[i] = arr[i]**2

if __name__=='__main__':

    arr=array('i',[1,2,3])
    print(arr)
    print('Object Identifier: {0:d}'.format(id(arr)))
    # results in passing a reference to the
    # the original array
    func(arr)
    print(arr)

When you run the above snippet, it results in the following output:

array('i', [1, 2, 3])
Object Identifier: 3075060736
Object Identifier: 3075060736
array('i', [1, 4, 9])

It is a little surprising here, since the array is invoked in a call by value manner in C/C++. So why did the original array change? The answer lies in Python’s mechanism of passing a reference to the original object (the array object) itself, rather than a copy of it to the called function. Since, an array is a mutable object, the changes made to it is reflected in the original array sent. (You may think of this as equivalent to passing a pointer to the function). This idea of reference is better understood when you print the identifier of the array object in the caller and the called functions. As you can see, both of them are same. This won’t affect immutable objects such as a string, or an integer.

So, what do you do if you don’t want the changes made to an immutable object to propagate to the original object? Use the copy module. For example:

import copy
from array import array

def func(arr):
    print('Object Identifier: {0:d}'.format(id(arr)))
    for i in range(0,len(arr)):
        arr[i] = arr[i]**2

if __name__=='__main__':

    arr=array('i',[1,2,3])
    print(arr)
    print('Object Identifier: {0:d}'.format(id(arr)))
    # Pass the copy of 
    # the original array
    myarr = copy.copy(arr)
    func(myarr)

    print(arr)

The above code results in the following output:

array('i', [1, 2, 3])
Object Identifier: 3075063072
Object Identifier: 3075060736
array('i', [1, 2, 3])

As can be seen from the object identifiers, they are two separate objects and hence changes made in the called function is not propagated to the original object.

Code: https://gist.github.com/3487468

Note that I have used Python 3.3 for the examples.

Some related links:

Advertisements