=["a1", "b2", "c3", "d4", "e5",
py_list"f6", "g7", "h8", "i9", "j10"]
=np.array(py_list)
np_array
# Pick one
= py_list # OR
x = np_array x
5 Storing Data (Good)
5.1 What to expect in this chapter
You should now know how to store data using lists, arrays and dictionaries. I will now show you more details on accessing and modifying these structures. This is important because most of what you do with programming is related to accessing and changing data. You will also gain a better understand of the differences and similarities between lists, Num Py arrays and dictionaries.
5.2 Subsetting: Indexing and Slicing
You will often need to select a subset (subsetting) of the data in a list (or array). One form of this is picking a single element called indexing (You already know how to do this from the previous chapter). Another option is to select a range of elements. This is called slicing.
So, in summary, what we mean when we say…
- Subsetting means to ‘select’.
- Indexing refers to selecting one element.
- Slicing refers to selecting a range of elements.
5.2.1 Indexing & Slicing 1D (Lists & Arrays)
The following applies to both lists and arrays.
Setup
Indexing and slicing 1D lists and arrays
Since slicing gives us a range of elements, we must specify two indices to indicate where to start and end. The various syntaxes for these are shown in the table below.
Syntax | Result | Note | |
---|---|---|---|
x[0] |
First element | 'a1' |
|
x[-1] |
Last element | 'j10' |
|
x[0:3] |
Index 0 to 2 | ['a1','b2','c3'] |
Gives \(3−0=3\) elements |
x[1:6] |
Index 1 to 5 | ['b2','c3','d4','e5','f6'] |
Gives \(6−1=5\) elements |
x[1:6:2] |
Index 1 to 5 in steps of 2 | ['b2','d4','f6'] |
Gives every other of \(6−1=5\) elements |
x[5:] |
Index 5 to the end | ['f6','g7','h8','i9','j10'] |
Gives len(x) \(−5=5\) elements |
x[:5] |
Index 0 to 5 | ['a1','b2','c3','d4','e5'] |
Gives \(5−0=5\) elements |
x[5:2:-1] |
Index 5 to 3 (i.e. in reverse) | ['f6','e5','d4'] |
Gives \(5−2=3\) elements |
x[::-1] |
Reverses the list | ['j10','i9','h8',...,'b2','a1'] |
Remember slicing in Python can be a bit tricky. If you slice with [i:j]
, the slice will start at i
and end at j-1
, giving you a total of j-i
elements.
5.2.2 Subsetting by masking (Arrays only)
One of the most powerful things you can do with NumPy arrays is subsetting by masking. To make sense of this, consider the following.
= np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
np_array = np_array > 3
my_mask print(my_mask)
[False False False True True True True True True True]
The answer to my question is in the form of a ‘Yes’/‘No’ or True
/False
format. I can use this True
/False
format to ask NumPy to show me only those that are True
by
np_array[my_mask]
array([ 4, 5, 6, 7, 8, 9, 10])
This is why I used the term ‘masking’. The True
/False
answer acts like a mask allowing only the True
subset to be seen.
that subsetting by masking only works with NumPy arrays.
Instead of creating another variable, I can also do all of this succinctly as:
> 3] np_array[np_array
array([ 4, 5, 6, 7, 8, 9, 10])
Let me show you a few more quick examples
-
Let’s invert our selection by using the
~
. This is called the Bitwise Not operator.~(np_array > 3)] # '~' means 'NOT' np_array[
array([1, 2, 3])
-
> 3) & (np_array < 8)] # '&' means 'AND' np_array[(np_array
array([4, 5, 6, 7])
-
< 3) | (np_array > 8)] # '|' means 'OR' np_array[(np_array
array([ 1, 2, 9, 10])
you must use the Bitwise NOT(~
), Bitwise OR(|
) and Bitwise AND(&
) when combining masks with NumPy.
5.2.3 Indexing & Slicing 2D Lists
Some of you might still need convincing about the differences between lists and arrays. These become more obvious when you try to index and slice higher dimensional lists (and arrays).
Let’s consider the following 2D list.
= [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
py_list_2d 5, "E"], [6, "F"], [7, "G"], [8, "H"],
[9, "I"], [10, "J"]] [
-
3] # What is at position 4 (index 3)? py_list_2d[
[4, 'D']
-
3][0] # FIRST element at position 4 (index 3) py_list_2d[
4
-
3] py_list_2d[:
[[1, 'A'], [2, 'B'], [3, 'C']]
-
3][0] py_list_2d[:
[1, 'A']
You might think that this will yield the first elements (i.e.
[1, 2, 3]
) of all the sub-lists up to index 2; No! Instead, it gives the first of the list you get frompy_list_2d[:3]
.Arrays work very differently, as you will see in a moment.
-
3:6][0] py_list_2d[
[4, 'D']
5.2.4 Indexing & Slicing 2D Arrays
= np.array([[1, "A"], [2, "B"], [3, "C"], [4, "D"],
np_array_2d 5, "E"], [6, "F"], [7, "G"], [8, "H"],
[9, "I"], [10, "J"]]) [
-
3] # What is at position 4 (index 3)? np_array_2d[
array(['4', 'D'], dtype='<U21')
-
3, 0] # FIRST element at position 4 (index 3) np_array_2d[
'4'
Notice how the syntax for arrays uses just a single pair of square brackets (
[ ]
). -
3] np_array_2d[:
array([['1', 'A'], ['2', 'B'], ['3', 'C']], dtype='<U21')
-
3, 0] np_array_2d[:
array(['1', '2', '3'], dtype='<U21')
-
3:6, 0] np_array_2d[
array(['4', '5', '6'], dtype='<U21')
-
If you want ‘everything’ you just use
:
.0] np_array_2d[:,
array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U21')
5.2.5 Growing lists
NumPy arrays are invaluable, and their slicing syntax (e.g. [:3,0]
) is more intuitive than lists. So, why do we even bother with lists? One advantage of lists is their ease and efficiency in growing. NumPy arrays are fantastic for fast math operations, provided you do not change their size1. Now let me show you how to grow a list. This will be useful later when you try to solve differential equations numerically.
-
=[1]*10 x# I am lazy to type print() x # Let Jupyter do the hard work.
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
-
Appending one element at a time.
=[1] x= x + [2] x= x + [3] x= x + [4] x x
[1, 2, 3, 4]
-
This works too.
=[1] x+= [2] x+= [3] x+= [4] x x
[1, 2, 3, 4]
-
If you want to append multiple elements in one go.
=[1] x+= [2, 3, 4] x x
[1, 2, 3, 4]
-
In case you want to pre-pend stuff.
=[1] x= [2, 3, 4] + x x x
[2, 3, 4, 1]
-
Using
append()
with another list.=[1, 2, 3] x4, 5, 6]) x.append([ x
[1, 2, 3, [4, 5, 6]]
-
Use
extend()
if you just want to combine the elements of the list=[1, 2, 3] x4, 5, 6]) x.extend([ x
[1, 2, 3, 4, 5, 6]
-
Finally, you will also find the hidden function
append()
useful:=[1] x2) x.append(3) x.append(4) x.append( x
[1, 2, 3, 4]
5.3 Some loose ends
5.3.1 Tuples
Before we end this section, I must introduce you to another data storage structure called an tuple. Tuples are similar to lists, except they use ( )
and cannot be changed after creation.
Let me first create a simple tuple.
=(1, 2, 3) # Define tuple a
We can access its data…
print(a[0]) # Access data
1
But, we cannot change the data.
# The following will NOT work
0]=-1
a[0] += [10] a[
5.3.2 Be VERY careful when copying
Variables in Python have subtle features that require you to be mindful when making copies of lists and arrays.
For example, if you want to copy a list, you might be tempted to do the following.
=[1, 2, 3]
x=x # DON'T do this!
y=x # DON'T do this! z
The correct way to do this is as follows:
=[1, 2, 3]
x=x.copy()
y=x.copy() z
Note: At this stage, you only have to know that you must use copy()
to be safe; you do not have to understand why.
The gains in speed are due to NumPy doing things to all the elements in the array in one go. For this, the data needs to be stored in a specific order in memory. Adding or removing elements hinders this optimisation. When you change the size of a NumPy array, NumPy destroys the existing array and creates a new one, making it extremely inefficient.↩︎