Chapter 4 Missing Values

Here, we use visna::extract to look at the missing value patterns. The bars beneath the columns in Figure 4.1 show the proportions of missingness by variable, suggesting hurricane diameter (hu_diameter) and storm diameter (ts_diameter) both have the highest number of missing value and the columns suggest that they follow the same missing pattern, meaning when hurricane diameter is missing, storm diameter is also missing.

The third most missing variable is Pressure (min_pressure) and the columns show that Pressure is missing only when hurricane diameter and storm diameter are missing. The bars on the right show the relative frequencies of the missing patterns, which suggest the most frequent missing patterns are in the combination of hurricane diameter, storm diameter and pressure, followed by the combination of hurricane diameter and storm diameter. Non-missing data are in the third meaning most of rows are completeness.

Finally, hurricane diameter and storm diameter are the most missing values because they are calculated from Wind Raddii but Wind Raddii values were not used before 2004.

Missing Values Patterns

Figure 4.1: Missing Values Patterns

It is worth noting that almost all Pressure data are missing from 1850s to 1940s. The number of missing data is then decreasing from the 1940s, and there are no missing Pressure starting the 2000s (see Figure 4.2). The reasons for this are are the following: (1) in the early years, information about pressure was recorded by ships; those were few in numbers and thus a lot of pressure data could not be recorded (2) in more recent years, it became a common habit to replace missing pressure values by an analytical product such as sattelite data; (3) improvements in modern tools and technologies have provided us with a more powerful observation network than that in the early years.

Proportion of Missing Air Pressure at the Storm's Center By Year

Figure 4.2: Proportion of Missing Air Pressure at the Storm’s Center By Year

Another noticeable pattern in missing value relates to the name of tropical cyclones (see Figure 4.3).

Before the 1950s, it was not common practice to name tropical cyclones. This explains why we cannot find any cyclone with a name before that time. A couple were still not named after that, and the reason can be linked to the category: those cyclones are of category 0 and 1 (more information about categories can be found in part 5), which means that they could have been overlooked and therefore didn’t receive any name.

Cyclones received names starting 1950s

Figure 4.3: Cyclones received names starting 1950s