The last of the or
The present invent
A/B Testing - na
A high dose of a n
Wir stellen unsere
---
title: "Compil
/*
* This progra
Novel drug resista
1. Introduction
==
[Vitamin A supplemQ:
How to calculate percentage of time using the values in columns of a dataframe
I am fairly new to data analysis and struggling to understand the syntax for a problem I am trying to solve. I have a dataframe with start and end dates of some events, and am trying to calculate the proportion of events that occurred between start and end date. My data looks like this:
start end event
2014-05-05 2014-05-07 event 1
2014-06-01 2014-06-02 event 2
2014-06-04 2014-06-05 event 3
2014-06-08 2014-06-09 event 4
2014-06-14 2014-06-15 event 5
2014-06-21 2014-06-23 event 6
2014-06-29 2014-06-30 event 7
and I want to calculate the proportion of events that occurred between 2014-05-06 (start) and 2014-06-25 (end) for each week, so the result would look like this:
week start end percentage
1 2014-05-06 2014-05-15 0.00%
2 2014-05-16 2014-05-31 0.00%
3 2014-06-01 2014-06-15 20.00%
4 2014-06-16 2014-06-29 20.00%
5 2014-06-30 2014-07-05 60.00%
6 2014-07-06 2014-07-12 0.00%
7 2014-07-13 2014-07-19 0.00%
8 2014-07-20 2014-07-26 0.00%
9 2014-07-27 2014-07-31 0.00%
10 2014-08-01 2014-08-07 0.00%
I would be very grateful for help in constructing a dplyr solution to this problem. Thanks a lot in advance!
A:
Here's an option with dplyr and lubridate:
library(dplyr)
library(lubridate)
df %>%
group_by(Week_num = week(start)) %>%
mutate(Percentage = 100 * (end - start)/difftime(max(end, start), start)) %>%
distinct()
Output:
# A tibble: 10 × 5
start end event Week_num Percentage
1 2014-05-05 2014-05-07 event 1 0
2 2014-06-01 2014-06-02 event 2 0
3 2014-06-04 2014-06-05 event 3 0
4 2014-06-08 2014-06-09 event 4 0
5 2014-06-14 2014-06-15 event 5 0
6 2014-06-21 2014-06-23 event 6 0
7 2014-06-29 2014-06-30 event 7 0
8 2014-07-01 2014-07-03 event 8 0
9 2014-07-06 2014-07-08 event 9 0
10 2014-07-13 2014-07-15 event 10 0
DATA:
df <- read.table(text =
"start end event
'2014-05-05' '2014-05-07' event 1
'2014-06-01' '2014-06-02' event 2
'2014-06-04' '2014-06-05' event 3
'2014-06-08' '2014-06-09' event 4
'2014-06-14' '2014-06-15' event 5
'2014-06-21' '2014-06-23' event 6
'2014-06-29' '2014-06-30' event 7",
header = TRUE)
Edit: using weekly option:
weekly(df) %>%
group_by(Week_num = week(start)) %>%
mutate(Percentage = 100 * (end - start)/difftime(max(end, start), start)) %>%
distinct()
or
df %>%
group_by(Week_num = week(start)) %>%
mutate(Percentage = 100 * (end - start)/diff(start, days = 7)) %>%
distinct()
Output:
# A tibble: 10 × 5
# Groups: Week_num [3]
start end event Week_num Percentage
1 2014-05-05 2014-05-07 event 1 0
2 2014-06-01 2014-06-02 event 2 0
3 2014-06-04 2014-06-05 event 3 0
4 2014-06-08 2014-06-09 event 4 0
5 2014-06-14 2014-06-15 event 5 0
6 2014-06-21 2014-06-23 event 6 0
7 2014-06-29 2014-06-30 event 7 0
8 2014-07-01 2014-07-03 event 8 0
9 2014-07-06 2014-07-08 event 9 0
10 2014-07-13 2014-07-15 event 10 0
DATA:
df <- read.table(text =
"start end event
'2014-05-05' '2014-05-07' event 1
'2014-06-01' '2014-06-02' event 2
'2014-06-04' '2014-06-05' event 3
'2014-06-08' '2014-06-09' event 4
'2014-06-14' '2014-06-15' event 5
'2014-06-21' '2014-06-23' event 6
'2014-06-29' '2014-06-30' event 7",
header = TRUE)
Also, a base-r option is a bit simpler:
df %>%
group_by(Week_num = week(start)) %>%
summarize(Percentage = 100 * (end - start)/difftime(max(end, start), start))