2024-heraklion-data/notebooks/020_numpy/001_numpy_views_and_copies.ipynb

12 KiB

In [3]:
import numpy as np

def print_info(a):
    """ Print the content of an array, and its metadata. """
    
    txt = f"""
dtype\t{a.dtype}
ndim\t{a.ndim}
shape\t{a.shape}
strides\t{a.strides}
    """

    print(a)
    print(txt)
In [4]:
x = np.arange(12).reshape(3, 4).copy()
print_info(x)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

dtype	int64
ndim	2
shape	(3, 4)
strides	(32, 8)
    

Views

Operations that only require changing the metadata always do so, and return a view

In [6]:
# slice
y = x[0::2, 1::2]
print_info(y)
[[ 1  3]
 [ 9 11]]

dtype	int64
ndim	2
shape	(2, 2)
strides	(64, 16)
    

A view shares the same memory block as the original array.

In [7]:
z = x.reshape(1, 12)
print_info(z)
[[ 0  1  2  3  4  5  6  7  8  9 10 11]]

dtype	int64
ndim	2
shape	(1, 12)
strides	(96, 8)
    

CAREFUL: Modifying the view changes the original array and all other views of that array as well!

in place operations
In [8]:
print(y) # a view of x
[[ 1  3]
 [ 9 11]]
In [9]:
y += 100 
print_info(y)
[[101 103]
 [109 111]]

dtype	int64
ndim	2
shape	(2, 2)
strides	(64, 16)
    
In [10]:
print_info(x)
print_info(z)
[[  0 101   2 103]
 [  4   5   6   7]
 [  8 109  10 111]]

dtype	int64
ndim	2
shape	(3, 4)
strides	(32, 8)
    
[[  0 101   2 103   4   5   6   7   8 109  10 111]]

dtype	int64
ndim	2
shape	(1, 12)
strides	(96, 8)
    

Functions that take an array as an input should avoid modifying it in place!

Always make a copy or be super extra clear in the docstring.

In [12]:
def robust_log(x, cte = 3):
    """ 
    Returns the log of an array, deals with values that are equal to 0.

    `x` is expected to have non-negative values.
    """
    x[x == 0] += cte
    return np.log(x)
    
# this is not being very clear
In [13]:
a = np.array([[96, 0.01], [0, 1]])
In [14]:
# a view of `a`
b = a[1, :]
print_info(b)
[0. 1.]

dtype	float64
ndim	1
shape	(2,)
strides	(8,)
    
In [15]:
robust_log(a)
Out[15]:
array([[ 4.56434819, -4.60517019],
       [ 1.09861229,  0.        ]])
In [16]:
np.set_printoptions(suppress=True)
b
Out[16]:
array([3., 1.])

Better to make a copy!

In [17]:
def robust_log(x, cte = 3):
    """ Returns the log of an array, deals with values that are 0.

    `x` is expected to have non-negative values.
    """
    x = x.copy()
    x[x == 0] += cte
    return np.log(x)
In [18]:
a = np.array([[96, 0.01], [0, 1]])
b = a[1, :]
print(b)
robust_log(a)
[0. 1.]
Out[18]:
array([[ 4.56434819, -4.60517019],
       [ 1.09861229,  0.        ]])
In [19]:
b
Out[19]:
array([0., 1.])

Copies

  • Operations that cannot be executed by changing the metadata create a new memory block, and return a copy
  • can be forced by method .copy()

Choosing row, columns, or individual elements of an array by giving explicitly their indices (a.k.a "fancy indexing") it's an operation that in general cannot be executed by changing the metadata alone.

Therefore, fancy indexing always returns a copy.

In [20]:
x = np.arange(12).reshape(3, 4).copy()
print_info(x)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

dtype	int64
ndim	2
shape	(3, 4)
strides	(32, 8)
    
In [21]:
#print(x)
z = x[[0, 0, 2], [1, 0, 3]]
# what's z equal to?

print_info(z)
[ 1  0 11]

dtype	int64
ndim	1
shape	(3,)
strides	(8,)
    
In [22]:
z += 1000
print_info(z)

# the original array is unchanged => not a view!
print_info(x)
[1001 1000 1011]

dtype	int64
ndim	1
shape	(3,)
strides	(8,)
    
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

dtype	int64
ndim	2
shape	(3, 4)
strides	(32, 8)
    

Views are created, when you use other strides to read your data. Slicing and regular indexing allows that, as you know how many byte steps you need to take to get the data.

Fancy indexing does not allow that, because the data you are asking cannot be obtained by just changing the strides. Thus, numpy needs to create a copy of it in memory.