py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]4 Storing Data (Need)

What To Expect In This Chapter
We are learning Python to help us understand and solve science-related problems. To do this, we must interact with information/data and transform them to yield a solution. So, it is essential to have ways to store and manipulate data easily and efficiently. For this, we need structures beyond the simple variables we have encountered.
Python offers a variety of ways to store and manipulate data. You have already met the list and dictionary in the previous chapter. However, there are several more; here is a (non-comprehensive) list.
- Lists
- Numpy arrays
- Dictionaries
- Tuples
- Dataframes
- Classes
This part will cover the four structures (Python lists, Numpy arrays, dictionaries and tuples). I will show you how to use dataframes in a separate part. Classes are an advanced topic I will touch on in the Nice chapter.
4.1 Lists, Arrays & Dictionaries
4.1.1 Let’s compare
Let me show you how to store the same information (in this case, some superhero data) using lists, arrays and dictionaries.
Python Lists
Numpy Arrays
np_super_names = np.array(["Black Widow", "Iron Man", "Doctor Strange"])
np_real_names = np.array(["Natasha Romanoff", "Tony Stark", "Stephen Strange"])Dictionary
superhero_info = {"Natasha Romanoff": "Black Widow",
"Tony Stark": "Iron Man",
"Stephen Strange": "Doctor Strange"
}Dictionaries use a key and an associated value separated by a :
Notice how the dictionary very elegantly holds the real and superhero names in one structure while we need two lists (or arrays) for the same data. Further, for lists and arrays, the order matters. However, lists (and arrays) offer many features that dictionaries don’t. You will see these in a bit. Which data storage strategy to choose will depend on the problem you are trying to solve. More on this later; for the moment…
there are three basic ways of storing data:
- lists,
- NumPy arrays and
- dictionaries.
By the way,
- You can choose any name for the variables. I have decided to add
pyandnpfor clarity. - I am being lazy; when I say ‘arrays’, I mean ‘NumPy arrays’, and when I say ‘lists’, I mean ‘Python lists’.
4.1.2 Accessing data from a list (or array)
To access data in lists (and arrays), we need to use an index corresponding to the data’s position. Python is a zero-indexed language, so it starts counting at 0. So if you want to access a particular element in the list (or array), you need to specify the relevant index starting from zero. The image below shows the relationship between the position and index.

py_super_names = ["Black Widow", "Iron Man", "Doctor Strange"]
py_real_names = ["Natasha Romanoff", "Tony Stark", "Stephen Strange"]-
print(py_real_names[0])Natasha Romanoff -
print(py_super_names[0])Black Widow -
Using a negative index allows us to count from the back of the list. For instance, using the index -1 will give the last element. This is super useful because we can easily access the last element without knowing the list size.
print(py_super_names[2]) # Forward indexing # We need to know the size # beforehand for this to work.Doctor Strangeprint(py_super_names[-1]) # Reverse indexingDoctor Strange
Data in lists (and arrays) must be accessed using a zero-based index.
4.1.3 Accessing data from a dictionary
Dictionaries hold data (values) paired with a key. i.e. you can access the value (in this case, the superhero name) using the real name as a key. Here is how it works:
superhero_info = {"Natasha Romanoff": "Black Widow",
"Tony Stark": "Iron Man",
"Stephen Strange": "Doctor Strange"
}print(superhero_info["Natasha Romanoff"])Black Widow
that dictionaries have a key-value structure.
You can access all the keys and values as follows:
superhero_info.keys()dict_keys(['Natasha Romanoff', 'Tony Stark', 'Stephen Strange'])
superhero_info.values()dict_values(['Black Widow', 'Iron Man', 'Doctor Strange'])
4.1.4 Higher dimensional lists
Unlike with the dictionary, we needed two lists to store the corresponding real and superhero names. An obvious way around the need to have two lists is to have a 2D list (or array) as follows.
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],
['Tony Stark', 'Iron Man'],
['Stephen Strange', 'Doctor Strange']]4.2 Lists vs. Arrays
Let me now show you a few quick examples of using lists and arrays. These will allow you to appreciate the versatility that each offer.
4.2.1 Size
Often, you need to know how many elements there are in lists. We can use the len() function for this.
Setup
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
[5, "E"], [6, "F"], [7, "G"], [8, "H"],
[9, "I"], [10, "J"]]
np_array_2d = np.array(py_list_2d) # Reusing the Python list
# to create a NEW
# NumPy array2D Lists
len(py_list_2d)10
2D Arrays
len(np_array_2d)10
np_array_2d.shape(10, 2)
Notice the absence of brackets ( ) in shape above. This is because shape is not a function. Instead, it is a property or attribute of the NumPy array.
4.2.2 Arrays are fussy about type
Please recall the previous discussion about data types (e.g. int, float, str). One prominent difference between lists and arrays is that arrays like having a single data type; lists are more accommodating. Consider the following example and notice how the numbers are converted to English (' ') when we create the NumPy array.
py_list = [1, 1.5, 'A']
np.array(py_list)array(['1', '1.5', 'A'], dtype='<U32')
When dealing with datasets with both numbers and text, you must be mindful of this restriction. However, this is just an annoyance and not a problem as we can easily change type (typecast) using the ‘hidden’ function astypes(). More about this in a later chapter. For the moment,
that NumPy arrays tolerate only a single type.
4.2.3 Adding a number
Setup
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list) # Reusing the Python list
# to create a NEW
# NumPy arrayLists
py_list + 10 # Won't work!Arrays
np_array + 10array([11, 12, 13, 14, 15])
4.2.4 Adding another list
Setup
py_list_1 = [1, 2, 3, 4, 5]
py_list_2 = [10, 20, 30, 40, 50]
np_array_1 = np.array(py_list_1)
np_array_2 = np.array(py_list_2)Lists
py_list_1 + py_list_2[1, 2, 3, 4, 5, 10, 20, 30, 40, 50]
Arrays
np_array_1 + np_array_2array([11, 22, 33, 44, 55])
4.2.5 Multiplying by a Number
Setup
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list) Lists
py_list*2[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
Arrays
np_array*2array([ 2, 4, 6, 8, 10])
So multiplying by a number makes a list grow, whereas an array multiplies its elements by the number!
Squaring
Setup
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list)Lists
py_list**2 # Won't work! Arrays
np_array**2array([ 1, 4, 9, 16, 25])
4.2.6 Asking questions
Setup
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list) Lists
-
py_list == 3 # Works, but what IS the question?False -
py_list > 3 # Won't work!
Arrays
-
np_array == 3array([False, False, True, False, False]) -
np_array > 3array([False, False, False, True, True])
4.2.7 Mathematics
Setup
py_list = [1, 2, 3, 4, 5]
np_array = np.array(py_list) Lists
-
sum(py_list) # sum() is a base Python function15 -
max(py_list) # max() is a base Python function5 -
min(py_list) # min() is a base Python function1 -
py_list.sum() # Won't work!
Arrays
-
np_array.sum()15 -
np_array.mean()3.0 -
np_array.max()5 -
np_array.min()1
Roughly speaking an operation on a list works on the whole list. In contrast, an operation on an array works on the individual elements of the array.