20 KiB
Numpy arrays¶
import numpy as np
Numpy arrays in memory (representation)¶
reminder
X = np.arange(0,9).reshape(3,3)
print(X)
# flatten
X.ravel()
# transpose
X.T
# slice
X[::2, ::2]
Views and Copies: an important distinction!¶
View
- accessing the array without changing the databuffer
- regular indexing and slicing give views
- in-place operations can be done in views
Copy
- when a new array is created by duplicating the data buffer as well as the array metadata
- fancy indexing give always copies
- a copy can be forced by method .copy()
How to know? with base
def is_view(a, x): #checks if the base of a is the same as the base of x
return a.base is x
a = np.arange(1,7)
print('a = ',a)
# create slice of a and print its base
a_slice = a[2:5]
print('a_slice = ', a_slice)
print('The base of a_slice is ', a_slice.base)
print('Is a_slice a view of a?', is_view(a_slice, a))
# create a copy of a and print its base
a_copy = np.reshape(a, (2,3)).copy()
print('a_copy = ', a_copy)
print('the base of a_copy ', a_copy.base)
print('a and a_copy have the same base ', is_view(a_copy, a))
## ! DON'T understand
# create a copy of a and print its base
a_2_3 = np.reshape(a, (2,3))
b = np.reshape(a_2_3, (2,3))
print(b)
print(is_view(b, a))
print(is_view(a_2_3, a))
print(is_view(b, a_2_3)) #??????
As a copy is a different array in memory, modifiying it will not change the base array
a = np.arange(1, 7)
# create a copy
a_copy = np.reshape(a, (2,3)).copy()
a_copy[1,1] = 666
print('a ', a)
print('a_copy ', a_copy)
change an element in the copy, print original array¶
a = np.arange(1, 7)
# create a copy
a_copy = np.reshape(a, (2,3)).astype('float64')
a_view = np.reshape(a, (2,3))
a_copy[1,1] = 666.44
a_view[1,1] = 101.6555 # the data type in the original array (int) stays the same
print('a ', a)
print('a_view: ', a_view)
print('a_view strides: ', a_view.strides)
print('a_copy ', a_copy)
print('a_copy strides: ', a_view.strides)
print('a_copy base: ', a_copy.base)
The same operation with a view, however, will carry the change
Take-away: you do need to know if you are using a view or a copy, particularly when you are operating on the array in-place
1.2.1 Strides - why some indexing gives copies and others views?¶
- how does numpy arrange data in memory? - When you create an array, numpy allocates certain memory that depends on the type you choose
a = np.arange(9).reshape(3,3)
print(a)
a.itemsize
In this example the array has 8 bytes allocated per item.
Memory is linear, that means, the 2-D array will look in memory something like this (blue boxes)
However, the user 'sees' the array in 2D (green boxes).
How does numpy accomplishes this? By defining strides
.
a.strides
Strides tell you by how many bytes you should move in memory when moving one step in that dimension.
To go from the first item in the first row to the first item in the second row, you need to move (3*8) 24 bytes. To move from the column-wise, you just need to move 8 bytes.
Views are created when you use other strides to read your data. Slicing and regular indexing allows that, as you know how many byte steps you need to take to get the data.
Fancy indexing does not allow that, because the data you are asking cannot be obtained by just changing the strides. Thus, numpy need to make a copy of it in memory.
Now, you can change the strides of an array at will.
a.strides=(8,24)
a
But be careful! Changing the strides to something non-sensical will also give you non-sense. And numpy will not complain.
a.strides=(8, 9)
Exercises on indexing, views/copies¶
Exercise 1: indexing, dimensionality of the output, view or copy?¶
Look at the following code examples and before running it, try to answer for each case: \ (1) what is the dimensionality of v? \ (2) is v a view or a copy?
x = np.arange(0,12).reshape(3,4)
x[::2, :] #dim, view or copy
#is_view(x[::2, :], x.base)
x[1, :]
#is_view(x[1, :], x.base)
x[1]
#is_view(x[1], x.base)
print(x)
x[[1, 2, 0], [1, 1, 2]]
#is_view(x[[1, 2, 0], [1, 1, 2]], x.base)
Fancy indexing¶
x[[0, 2], :]
#is_view(x[[0, 2], :], x.base)
x.reshape((6, 2))
#is_view(x.reshape((6, 2)), x.base)
x.ravel()
#is_view(x.ravel(), x.base)
x.T.ravel()
#is_view(x.T.ravel(), x.base)
x[(x % 2) == 1]
#is_view(x[(x % 2) == 1], x.base)
y = x + 2
##### Is this because
#is_view(y, x)
y = np.sort(x, axis=1)
#is_view(y, x)
Sources + other resources¶
ASPP Bilbao 2022 - Lisa Schwetlick & Aina Frau-Pascual https://github.com/ASPP/2022-bilbao-advanced-numpy
Scipy lecture notes, 2022.1
- Basic Numpy: http://scipy-lectures.org/intro/numpy/index.html
- Advanced Numpy: http://scipy-lectures.org/advanced/advanced_numpy/index.html
Numpy chapter in "Python Data Science Handbook" https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html
Further resources on strides: