Intro to Numerical Computing with NumPy

by Alec

Posted on August 11, 2018 at 12:28 PM

a = [1, 2, 3, 4]

b = [10, 11, 12, 13]

a + b

[1, 2, 3, 4, 10, 11, 12, 13]

output = []

for item1, item2 in zip(a, b):
    output.append(item1 + item2)

output

[11, 13, 15, 17]

g = list(range(1000000))

%timeit sum(g)

17.5 ms ± 932 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

import numpy as np

g_array = np.array(g)

%timeit np.sum(g_array)

1.98 ms ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The reason for the speed increase: The python list contains a list of addresses to memory locations. Numpy contains all integers in one integer array that are all stored together and the interpreter does not have to perform type checking, and derefernce the integers.

a = np.array([1, 2, 3, 4])
b = np.array([10, 11, 12, 13])

array([1, 2, 3, 4])

array([10, 11, 12, 13])

a + b

array([11, 13, 15, 17])

a * b

array([10, 22, 36, 52])

a / b

array([0.1       , 0.18181818, 0.25      , 0.30769231])

a ** b

array([       1,     2048,   531441, 67108864])

type(a)

numpy.ndarray

a.dtype

dtype('int64')

a.ndim

a.shape

(4,)

Has trailing comma becuase it is a tuple with one element.

a.itemsize

a.nbytes

a * 100

array([100, 200, 300, 400])

np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

np.log(a)

array([0.        , 0.69314718, 1.09861229, 1.38629436])

np.exp(a)

array([ 2.71828183,  7.3890561 , 20.08553692, 54.59815003])

np.log

<ufunc 'log'>

a[0]

a[0] = 10

array([10,  2,  3,  4])

a[0] = 10.6

array([10,  2,  3,  4])

type of the np array was integer so assigning a float just got truncated.

c = np.array([1, 2, 3, 4])

c.fill(0)
c

array([0, 0, 0, 0])

a = np.array([1, 2, 3, 4.0], dtype='int32')
a.dtype

dtype('int32')

array([1, 2, 3, 4], dtype=int32)

c = np.array([[10, 11, 12], [20, 21, 22]])
c

array([[10, 11, 12],
       [20, 21, 22]])

c.dtype

dtype('int64')

c.ndim

c.shape

(2, 3)

2 rows and 3 columns. (rows, columns)

default ‘building block’ of numpy is an array, not a matrix. default is row major.

a.T

array([1, 2, 3, 4], dtype=int32)

does not product a column vector, like it would in MATLAB

c.size

c[0, 0]

c[0]

array([10, 11, 12])

a = np.array([10, 11, 12, 13, 14])
a

array([10, 11, 12, 13, 14])

-5 -4 -3 -2 -1
0 1 2 3 4

a[1:3]

array([11, 12])

a[1:-2]

array([11, 12])

a[-4:3]

array([11, 12])

a[:3]

array([10, 11, 12])

a[-2:]

array([13, 14])

a[::2]

array([10, 12, 14])

a = np.zeros((6, 6), dtype='int64')
for i in range(6):
    a[i] = (i+1) * np.array(range(1, 7))
a

array([[ 1,  2,  3,  4,  5,  6],
       [ 2,  4,  6,  8, 10, 12],
       [ 3,  6,  9, 12, 15, 18],
       [ 4,  8, 12, 16, 20, 24],
       [ 5, 10, 15, 20, 25, 30],
       [ 6, 12, 18, 24, 30, 36]])

a[0, 3:5]

array([4, 5])

a[4:, 4:]

array([[25, 30],
       [30, 36]])

a[:, 2]

array([ 3,  6,  9, 12, 15, 18])

Everytime you index you drop a dimension.
Everytime you slice you keep that dimension.
If there is a colon present, then you are slicing.

a[1:4, 1:4]

array([[ 4,  6,  8],
       [ 6,  9, 12],
       [ 8, 12, 16]])

a[2::2, ::2]

array([[ 3,  9, 15],
       [ 5, 15, 25]])

a = np.arange(25).reshape(5, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

red = a[:, 1::2]
print(red)
blue = a[1::2, :3:2]
print(blue)
yellow = a[4]
print(yellow)
yellow = a[-1]
print(yellow)

[[ 1  3]
 [ 6  8]
 [11 13]
 [16 18]
 [21 23]]
[[ 5  7]
 [15 17]]
[20 21 22 23 24]
[20 21 22 23 24]

red[-1, -1] = 0

red

array([[ 1,  3],
       [ 6,  8],
       [11, 13],
       [16, 18],
       [21,  0]])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22,  0, 24]])

id(a)

140687300558080

id(red)

140687300532784

red.flags

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

a.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

red.data

<memory at 0x7ff45d631c18>

a.data

<memory at 0x7ff45d631cf0>

red.copy()

array([[ 1,  3],
       [ 6,  8],
       [11, 13],
       [16, 18],
       [21,  0]])

mask = np.array([0, 1, 1, 0, 0, 1, 0, 0], dtype=bool)

mask2 = a < 30

a = np.arange(0, 80, 10)
a

array([ 0, 10, 20, 30, 40, 50, 60, 70])

indices = [1, 2, -3]
y = a[indices]
y

array([10, 20, 50])

a[mask2]

array([ 0, 10, 20])

a[mask]

array([10, 20, 50])

a = np.array([3, -1, -2, 4, -6, 8])

array([ 3, -1, -2,  4, -6,  8])

a < 0

array([False,  True,  True, False,  True, False])

negatives = a < 0

a[negatives]

array([-1, -2, -6])

a[a < 0]

array([-1, -2, -6])

a[a < 0] = 0

array([3, 0, 0, 4, 0, 8])

a < 8

array([ True,  True,  True,  True,  True, False])

a > 3

array([False, False, False,  True, False,  True])

a > 3 and a < 8

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-176-a611552ecf48> in <module>()
----> 1 a > 3 and a < 8

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

(a < 8).any()

True

# Binary Operators
#   and, or, not
# Bitwise operators
#   & (and), | (or), ~ (not), ^ (xor)

(a > 3) & (a < 8)

array([False, False, False,  True, False, False])

array([3, 0, 0, 4, 0, 8])

f = 3

Slicing gives a view of the same data (memory buffer).
Fancy indexing (masking) gives a copy.

np.nonzero(negatives)

(array([1, 2, 4]),)

a.sort()

array([0, 0, 0, 3, 4, 8])

a = np.array([10, 1, 20])
b = np.array([2, 3, 20])

a > b

array([ True, False, False])

a = np.arange(25).reshape(5, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

mask = a % 3 == 0

mask

array([[ True, False, False,  True, False],
       [False,  True, False, False,  True],
       [False, False,  True, False, False],
       [ True, False, False,  True, False],
       [False,  True, False, False,  True]])

a[mask]

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24])

output = np.empty_like(a, dtype='float')
output.fill(np.nan)
output

array([[nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan]])

output[mask] = a[mask]
output

array([[ 0., nan, nan,  3., nan],
       [nan,  6., nan, nan,  9.],
       [nan, nan, 12., nan, nan],
       [15., nan, nan, 18., nan],
       [nan, 21., nan, nan, 24.]])

np.where(a % 3 == 0, a, np.nan)

array([[ 0., nan, nan,  3., nan],
       [nan,  6., nan, nan,  9.],
       [nan, nan, 12., nan, nan],
       [15., nan, nan, 18., nan],
       [nan, 21., nan, nan, 24.]])

l = [1, 2, 3, 4, 5]
l2 = l[:]

l2[2] = 0

l2

[1, 2, 0, 4, 5]

[1, 2, 3, 4, 5]

Computations with Arrays

Rule 1: Operations between multiple array objects are first checked for proper shape match.
See documentation about broadcasting
Rule 2: Mathematical operators (+ - * / exp, log, …) apply element by element, on the values.
Rule 3: Reduction opertations (mean, std, skew, kurt, sum, prod, …) apply to the whole array, unless an axis is specified.
Rule 4: Missing values propagate unless explicitly ignored (nanmean, nansum, …)

(2, 3)
specify axis 0, then have a (3,)
specify axis 1, then have a (2,)

a = np.array([[1, 2, 3], [4, 5, 6]])

print(np.sum(a, axis=0))
print(np.sum(a, axis=1))

[5 7 9]
[ 6 15]

a = np.arange(24).reshape(6, 4)

a.shape

(6, 4)

Intro to Numerical Computing with NumPy

Computations with Arrays

Search

Categories