Numpy arrays¶

In [1]:

import numpy as np

Numpy arrays in memory (representation)¶

reminder

In [4]:

X = np.arange(0,9).reshape(3,3)
print(X)

[[0 1 2]
 [3 4 5]
 [6 7 8]]

memory_lists1

In [5]:

# flatten
X.ravel()

Out[5]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [8]:

# transpose
X.T

Out[8]:

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

In [9]:

# slice
X[::2, ::2]

Out[9]:

array([[0, 2],
       [6, 8]])

numpy_meta

numpy_magic

In [12]:

Views and Copies: an important distinction!¶

View

accessing the array without changing the databuffer
regular indexing and slicing give views
in-place operations can be done in views

Copy

when a new array is created by duplicating the data buffer as well as the array metadata
fancy indexing give always copies
a copy can be forced by method .copy()

How to know? with base

In [11]:

def is_view(a, x): #checks if the base of a is the same as the base of x
    return a.base is x

In [12]:

a = np.arange(1,7)
print('a = ',a)

a =  [1 2 3 4 5 6]

In [13]:

# create slice of a and print its base
a_slice = a[2:5]

print('a_slice = ', a_slice)
print('The base of a_slice is ', a_slice.base)

print('Is a_slice a view of a?', is_view(a_slice, a))

a_slice =  [3 4 5]
The base of a_slice is  [1 2 3 4 5 6]
Is a_slice a view of a? True

In [14]:

# create a copy of a and print its base

a_copy = np.reshape(a, (2,3)).copy()

print('a_copy = ', a_copy)
print('the base of a_copy ', a_copy.base)
print('a and a_copy have the same base ', is_view(a_copy, a))

a_copy =  [[1 2 3]
 [4 5 6]]
the base of a_copy  None
a and a_copy have the same base  False

In [15]:

## ! DON'T understand

# create a copy of a and print its base
a_2_3 = np.reshape(a, (2,3))

b = np.reshape(a_2_3, (2,3))
print(b)
print(is_view(b, a))
print(is_view(a_2_3, a))
print(is_view(b, a_2_3)) #??????

[[1 2 3]
 [4 5 6]]
True
True
False

As a copy is a different array in memory, modifiying it will not change the base array

In [17]:

a = np.arange(1, 7)

#  create a copy
a_copy = np.reshape(a, (2,3)).copy()

a_copy[1,1] = 666

print('a ', a)
print('a_copy ', a_copy)

a  [1 2 3 4 5 6]
a_copy  [[  1   2   3]
 [  4 666   6]]

change an element in the copy, print original array¶

In [18]:

a = np.arange(1, 7)

#  create a copy
a_copy = np.reshape(a, (2,3)).astype('float64')
a_view = np.reshape(a, (2,3))

a_copy[1,1] = 666.44
a_view[1,1] = 101.6555 # the data type in the original array (int) stays the same  

print('a ', a)
print('a_view: ', a_view)
print('a_view strides: ', a_view.strides)
print('a_copy ', a_copy)
print('a_copy strides: ', a_view.strides)
print('a_copy base: ', a_copy.base)

a  [  1   2   3   4 101   6]
a_view:  [[  1   2   3]
 [  4 101   6]]
a_view strides:  (24, 8)
a_copy  [[  1.     2.     3.  ]
 [  4.   666.44   6.  ]]
a_copy strides:  (24, 8)
a_copy base:  None

The same operation with a view, however, will carry the change

Take-away: you do need to know if you are using a view or a copy, particularly when you are operating on the array in-place

1.2.1 Strides - why some indexing gives copies and others views?¶

how does numpy arrange data in memory? - When you create an array, numpy allocates certain memory that depends on the type you choose

In [20]:

a = np.arange(9).reshape(3,3)
print(a)

[[0 1 2]
 [3 4 5]
 [6 7 8]]

In [21]:

a.itemsize

Out[21]:

In this example the array has 8 bytes allocated per item.

Memory is linear, that means, the 2-D array will look in memory something like this (blue boxes)

linear_mem

However, the user 'sees' the array in 2D (green boxes).

How does numpy accomplishes this? By defining strides.

In [22]:

a.strides

Out[22]:

(24, 8)

Strides tell you by how many bytes you should move in memory when moving one step in that dimension.

strides

To go from the first item in the first row to the first item in the second row, you need to move (3*8) 24 bytes. To move from the column-wise, you just need to move 8 bytes.

Views are created when you use other strides to read your data. Slicing and regular indexing allows that, as you know how many byte steps you need to take to get the data.

Fancy indexing does not allow that, because the data you are asking cannot be obtained by just changing the strides. Thus, numpy need to make a copy of it in memory.

Now, you can change the strides of an array at will.

In [23]:

a.strides=(8,24)
a

Out[23]:

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

But be careful! Changing the strides to something non-sensical will also give you non-sense. And numpy will not complain.

In [24]:

a.strides=(8, 9)

Exercises on indexing, views/copies¶

Exercise 1: indexing, dimensionality of the output, view or copy?¶

Look at the following code examples and before running it, try to answer for each case: \ (1) what is the dimensionality of v? \ (2) is v a view or a copy?

In [25]:

x  = np.arange(0,12).reshape(3,4)

In [26]:

x[::2, :] #dim, view or copy

#is_view(x[::2, :], x.base)

Out[26]:

array([[ 0,  1,  2,  3],
       [ 8,  9, 10, 11]])

In [27]:

x[1, :]

#is_view(x[1, :], x.base)

Out[27]:

array([4, 5, 6, 7])

In [45]:

x[1]

#is_view(x[1], x.base)

Out[45]:

False

In [59]:

print(x)
x[[1, 2, 0], [1, 1, 2]]

#is_view(x[[1, 2, 0], [1, 1, 2]], x.base)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Out[59]:

array([5, 9, 2])

Fancy indexing¶

fancy

In [61]:

x[[0, 2], :]

#is_view(x[[0, 2], :], x.base)

Out[61]:

False

In [62]:

x.reshape((6, 2))

#is_view(x.reshape((6, 2)), x.base)

Out[62]:

True

In [ ]:

x.ravel()

#is_view(x.ravel(), x.base)

In [ ]:

x.T.ravel()

#is_view(x.T.ravel(), x.base)

In [68]:

x[(x % 2) == 1]

#is_view(x[(x % 2) == 1], x.base)

Out[68]:

(32, 8)

In [82]:

y = x + 2

##### Is this because 

#is_view(y, x)

Out[82]:

False

In [83]:

y = np.sort(x, axis=1)

#is_view(y, x)

Out[83]:

False

Sources + other resources¶

ASPP Bilbao 2022 - Lisa Schwetlick & Aina Frau-Pascual https://github.com/ASPP/2022-bilbao-advanced-numpy

Scipy lecture notes, 2022.1

Basic Numpy: http://scipy-lectures.org/intro/numpy/index.html
Advanced Numpy: http://scipy-lectures.org/advanced/advanced_numpy/index.html

Numpy chapter in "Python Data Science Handbook" https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html

Further resources on strides:

20 KiB Raw Blame History