As part of my **Data Analysis in R** course on **Udacity**, I’m publishing the results of an EDA I did on atmospheric nitric oxide and nitrogen dioxide concentrations somewhere in Cambrige (UK).

You can find the datasets here.

I plotted the levels of NO in the atmosphere over the span of several days, and noticed that it tends to have a daily cycle. The levels probably go down during the night, and go back up when it warms up during the day :

Viewed on a larger scale, the mean levels of NO seem to have no noticeable trend :

Plotting the levels of NO vs the levels of NO2, a noticable positive correlation emerges:

Taking the Pearson’s product-moment correlation of NO and NO2 concentrations reveals a value of 0.6968, which supports the observation.

Here’s the R code that produced these plots, for those of you that are interested:

```
#http://www.airqualityengland.co.uk/local-authority/data?la_id=51
library(ggplot2)
library(dplyr)
library(grid)
library(gridExtra)
ds1 <- read.csv("2014-05-07-141107012512.csv")
ds2 <- read.csv("2014-08-05-141107012512.csv")
ds3 <- read.csv("2014-11-04-141107012512.csv")
dataset <- rbind.data.frame(ds1, ds2, ds3)
dataset$timestamp <- as.numeric(strptime(paste(dataset$End.Date,dataset$End.Time), format = "%d/%m/%Y %H:00:00"))
dataset <- dataset[!is.na(dataset$timestamp ), ]
dataset$hour <- dataset$timestamp / 3600
dataset$hour <- dataset$hour - min(dataset$hour)
dataset$day <- round(dataset$timestamp / 86400)
dataset$day <- dataset$day - min(dataset$day)
sp1 <- ggplot(aes(x = hour, y = NO), data = cleanData) +
ylim(c(0, 150)) +
geom_line(color = "#334455") +
scale_x_continuous(breaks = seq(0, 200, 24), limits = c(0, 200)) +
labs(x = "Hour Since Start", y = "Nitric Oxide Concentration")
sp2 <- ggplot(aes(x = hour, y = NO), data = cleanData) +
ylim(c(0, 150)) +
geom_line(color = "#334455") +
scale_x_continuous(breaks = seq(0, 500, 24), limits = c(0, 500)) +
labs(x = "Hour Since Start", y = "Nitric Oxide Concentration")
grid.arrange(sp1, sp2)
dataset.by_day <- dataset %>%
group_by(day) %>%
summarise(mean_NO = mean(NO))
sp1 <- ggplot(aes(x = day, y = mean_NO), data = dataset.by_day) +
ylim(c(0, 100)) +
geom_line(color = "#334455") +
scale_x_continuous(breaks = seq(0, 200, 7), limits = c(0, 50)) +
labs(x = "Days Since Start", y = "Mean NO Concentration")
sp2 <- ggplot(aes(x = day, y = mean_NO), data = dataset.by_day) +
ylim(c(0, 100)) +
geom_line(color = "#334455") +
scale_x_continuous(breaks = seq(0, 200, 7), limits = c(0, 200)) +
labs(x = "Days Since Start", y = "Mean NO Concentration")
grid.arrange(sp1, sp2)
sp1 <- ggplot(aes(x = NO, y = NO2), data = dataset) +
xlim(c(0, 150)) +
ylim(c(0, 100)) +
geom_point(alpha = 1/5, position = position_jitter(width = 0.8, height = 0.8)) +
geom_smooth() +
labs(x = "Nitric oxide concentration", y = "Nitrogen dioxide concentration")
grid.arrange(sp1)
with(dataset, cor.test(x = NO, y = NO2)) # 0.6968017
```