How to Work with the Copy Function

In this guide, we're going to be discussing NumPy's copy function and view function, as well as how NumPy manages your computer's memory for both.

Let's start by creating the variable named array_a and assigning it to a NumPy array with a range of six. When we're done with that, let's go ahead and create a second array named array_b and that will be assigned to array_a. Let's go ahead and run this and return both object identifiers. As you probably expected, they share the same ID, and location in memory.

medium

Now let's modify array_b by assigning it to just a slice of array_a. And let's do zero to three. When we run it, the objects are now assigned to their own ID. And since they have their own ID, it might be assumed that each object is stored in their own memory location. But that's not really what's going on. What NumPy is doing is making a shallow copy or view of the original base array. But because the objects have their own unique ID, it gives the appearance of independence. But in reality, the view object is still a part of the base array.

medium

You should also know that since array_b is just a reference to array_a, any change made to array_b will modify array_a as well. So really quickly, we'll change the first element in array_b from zero to 100. And as expected, both arrays have been changed. In contrast to view, NumPy allows us to explicitly create copies of an array as well. Unlike a view that creates a shallow copy and shares memory with the base array, the copy function creates what we call a deep copy that has its own location in memory.

medium

Knowing all that, let's create a third array named array_c and assign that to array_a, followed by the copy function. Assuming we did everything correctly, array_c won't be sharing memory with array_a or array_b, so any changes that we make to array_c should be independent from the others. And to make sure that that's true, let's change the first element of array_c back from 100 to zero. And it looks like we did everything correctly.

medium

Finally, I'd like to show you a couple tricks to help you figure out if you're working with a view or a copy. As you've already seen throughout this guide, simply calling the ID doesn't work. So the first method leverages a NumPy function that returns a Boolean and it's called may_share_memory. The second method requires us to ask if an array is the base array or if it shares memory from another array.

medium

To determine if an array is a base array, we pass in the name of the array, .base, followed by is None. If the array is a base, then we'll have true returned. If we know an array isn't a base but instead want to figure out which array it's a view of, we can do that as well. For example, we want to determine if array_a is the base array for array_b. To check that, we pass in array_b.base, followed by is array_a. If array_a is the base, true should be returned. If it's not the base, then false will be returned.

medium

There's obviously pros and cons to both view and copy objects but the biggest benefit to a view object is that it saves time by not creating an entirely new object that needs to be iterated over. Instead, small chunks can be extracted through slicing. In contrast, the benefit to copy is that it's not going to be affected by modifications made to the original data, which comes in super handy when you're doing any type of exploratory analysis. So just in case you screw something up, you know you'll have a backup.

And that about brings us to the end of this guide, so I will see you in the next one.