Overview of Data Type Objects

This is going to be a fairly short guide, but we're going to start our high-level breakdown of data type objects. As I mentioned in the previous guide, every array object is associated with a data type object and the dtype describes different aspects of the data. So for now, we'll be covering the first three characteristics listed in numbpy's documentation. And those are data types, the size of data, and the byte order of the data.

I'm going to start by just showing you the general commands to access all the information, then we'll go back through for an actual explanation. So to do this, let's actually start by creating a simple 12 element array that has three rows and four columns. To find the data type object associated with the array we just pass in the name of the end dimensional array, followed by dtype. To get information about the memory layout of the array we pass in X.flags. And then the last thing we'll do is pass in X.strides.

medium

According to this, that data type is an integer and the size of each element is 64 bits. The byte order is C contiguous and the tuple refers to the number of locations and memory between beginnings of successive array elements, which I'll explain a little bit more as well.

medium

By default, the way the matrix is laid out in memory follows the C order or row-major order. So when it's mapped out, it will go from row to row. There's also something called Fortran order, or a column major, and instead, that works by going from column to column. looking back at our original output, we see that C contiguous is true. So that means right now we're using the default of row major order. So when everything's mapped in memory, it's going to be going from row to row, but internally our computer memory doesn't actually have a way of understanding the concept of rows and columns.

So you can kind of think of RAM as just being a one dimensional strip with everything having its own linear position. And the way numpy gets around this is by using something called strides. If you're not familiar with strides it just refers to the number of bytes between array elements. And that was represented by the tuple. So in ours, we had a tuple that contained 32 and eight. What that means is there's 32 bytes between successive row elements and eight bytes between column elements.

Knowing our matrix is laid out using row major order, it would look something like this in RAM. To go to the successive row element there needs to be a step or stride of 32 bytes. Going from column element to column element, there's a stride of eight bytes. Relating that back to the initial element size, it all makes sense because there's eight bits for every byte. So there's going to be a total of 64 bits in one stride, which is the distance from one element to the next element.

The reason this is so helpful is that when we take slices of an array, like we do when we're segmenting data, we don't have to have these massive chunks of data iterated over and returned back, which saves us a bunch of processing time.

So those are just a few of the basics when it comes to data type objects. And in the next guide, we're going to expand the topic a little bit more and discuss structured arrays as well as how to switch between different data types.