2024-heraklion-data/notebooks/.ipynb_checkpoints/notebook-2-numpy_SOLUTIONS-checkpoint.ipynb
2024-08-27 15:27:53 +03:00

20 KiB

Numpy arrays

In [1]:
import numpy as np

Numpy arrays in memory (representation)

reminder

In [4]:
X = np.arange(0,9).reshape(3,3)
print(X)
[[0 1 2]
 [3 4 5]
 [6 7 8]]

memory_lists1

In [5]:
# flatten
X.ravel()
Out[5]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In [8]:
# transpose
X.T
Out[8]:
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])
In [9]:
# slice
X[::2, ::2]
Out[9]:
array([[0, 2],
       [6, 8]])

numpy_meta

numpy_magic

In [12]:

Views and Copies: an important distinction!

View

  • accessing the array without changing the databuffer
  • regular indexing and slicing give views
  • in-place operations can be done in views

Copy

  • when a new array is created by duplicating the data buffer as well as the array metadata
  • fancy indexing give always copies
  • a copy can be forced by method .copy()

How to know? with base

In [11]:
def is_view(a, x): #checks if the base of a is the same as the base of x
    return a.base is x
In [12]:
a = np.arange(1,7)
print('a = ',a)
a =  [1 2 3 4 5 6]
In [13]:
# create slice of a and print its base
a_slice = a[2:5]

print('a_slice = ', a_slice)
print('The base of a_slice is ', a_slice.base)

print('Is a_slice a view of a?', is_view(a_slice, a))
a_slice =  [3 4 5]
The base of a_slice is  [1 2 3 4 5 6]
Is a_slice a view of a? True
In [14]:
# create a copy of a and print its base

a_copy = np.reshape(a, (2,3)).copy()

print('a_copy = ', a_copy)
print('the base of a_copy ', a_copy.base)
print('a and a_copy have the same base ', is_view(a_copy, a))
a_copy =  [[1 2 3]
 [4 5 6]]
the base of a_copy  None
a and a_copy have the same base  False
In [15]:
## ! DON'T understand

# create a copy of a and print its base
a_2_3 = np.reshape(a, (2,3))

b = np.reshape(a_2_3, (2,3))
print(b)
print(is_view(b, a))
print(is_view(a_2_3, a))
print(is_view(b, a_2_3)) #??????
[[1 2 3]
 [4 5 6]]
True
True
False

As a copy is a different array in memory, modifiying it will not change the base array

In [17]:
a = np.arange(1, 7)

#  create a copy
a_copy = np.reshape(a, (2,3)).copy()

a_copy[1,1] = 666

print('a ', a)
print('a_copy ', a_copy)
a  [1 2 3 4 5 6]
a_copy  [[  1   2   3]
 [  4 666   6]]

change an element in the copy, print original array

In [18]:
a = np.arange(1, 7)

#  create a copy
a_copy = np.reshape(a, (2,3)).astype('float64')
a_view = np.reshape(a, (2,3))

a_copy[1,1] = 666.44
a_view[1,1] = 101.6555 # the data type in the original array (int) stays the same  

print('a ', a)
print('a_view: ', a_view)
print('a_view strides: ', a_view.strides)
print('a_copy ', a_copy)
print('a_copy strides: ', a_view.strides)
print('a_copy base: ', a_copy.base)
a  [  1   2   3   4 101   6]
a_view:  [[  1   2   3]
 [  4 101   6]]
a_view strides:  (24, 8)
a_copy  [[  1.     2.     3.  ]
 [  4.   666.44   6.  ]]
a_copy strides:  (24, 8)
a_copy base:  None

The same operation with a view, however, will carry the change

Take-away: you do need to know if you are using a view or a copy, particularly when you are operating on the array in-place

1.2.1 Strides - why some indexing gives copies and others views?

  • how does numpy arrange data in memory? - When you create an array, numpy allocates certain memory that depends on the type you choose
In [20]:
a = np.arange(9).reshape(3,3)
print(a)
[[0 1 2]
 [3 4 5]
 [6 7 8]]
In [21]:
a.itemsize
Out[21]:
8

In this example the array has 8 bytes allocated per item.

Memory is linear, that means, the 2-D array will look in memory something like this (blue boxes)

linear_mem

However, the user 'sees' the array in 2D (green boxes).

How does numpy accomplishes this? By defining strides.

In [22]:
a.strides
Out[22]:
(24, 8)

Strides tell you by how many bytes you should move in memory when moving one step in that dimension.

strides

To go from the first item in the first row to the first item in the second row, you need to move (3*8) 24 bytes. To move from the column-wise, you just need to move 8 bytes.

Views are created when you use other strides to read your data. Slicing and regular indexing allows that, as you know how many byte steps you need to take to get the data.

Fancy indexing does not allow that, because the data you are asking cannot be obtained by just changing the strides. Thus, numpy need to make a copy of it in memory.

Now, you can change the strides of an array at will.

In [23]:
a.strides=(8,24)
a
Out[23]:
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

But be careful! Changing the strides to something non-sensical will also give you non-sense. And numpy will not complain.

In [24]:
a.strides=(8, 9)

Exercises on indexing, views/copies

Exercise 1: indexing, dimensionality of the output, view or copy?

Look at the following code examples and before running it, try to answer for each case: \ (1) what is the dimensionality of v? \ (2) is v a view or a copy?

In [25]:
x  = np.arange(0,12).reshape(3,4)
In [26]:
x[::2, :] #dim, view or copy

#is_view(x[::2, :], x.base)
Out[26]:
array([[ 0,  1,  2,  3],
       [ 8,  9, 10, 11]])
In [27]:
x[1, :]

#is_view(x[1, :], x.base)
Out[27]:
array([4, 5, 6, 7])
In [45]:
x[1]

#is_view(x[1], x.base)
Out[45]:
False
In [59]:
print(x)
x[[1, 2, 0], [1, 1, 2]]

#is_view(x[[1, 2, 0], [1, 1, 2]], x.base)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Out[59]:
array([5, 9, 2])

Fancy indexing

fancy

In [61]:
x[[0, 2], :]

#is_view(x[[0, 2], :], x.base)
Out[61]:
False
In [62]:
x.reshape((6, 2))

#is_view(x.reshape((6, 2)), x.base)
Out[62]:
True
In [ ]:
x.ravel()

#is_view(x.ravel(), x.base)
In [ ]:
x.T.ravel()

#is_view(x.T.ravel(), x.base)
In [68]:
x[(x % 2) == 1]

#is_view(x[(x % 2) == 1], x.base)
Out[68]:
(32, 8)
In [82]:
y = x + 2

##### Is this because 

#is_view(y, x)
Out[82]:
False
In [83]:
y = np.sort(x, axis=1)

#is_view(y, x)
Out[83]:
False