- Jun 2022
-
data-feminism.mitpress.mit.edu data-feminism.mitpress.mit.edu
-
80% of data analysis is spent on the process of cleaning and preparing the data
Imagine having unnecessary and wrong data in your document, you would most likely have to experience the concept of time demarcation -- the reluctance in going through every single row and column to eliminate these "garbage data". Clearly, owning all kinds of data without organizing them feels like stuffing your closet with clothes that you should have donated 5 years ago. It is a time-consuming and soul-destroying process for us. Luckily, in R, we have something in R called "tidyverse" package, which I believe the author talks about in the next paragraph, to make life easier for everyone. I personally use dplyr and ggplot2 when I deal with data cleaning, and they are extremely helpful. WIthout these packages' existence, I have no idea when I will be able to reach the final step of data visualization.
-
-
www.tidyverse.org www.tidyverse.org
-
across() is very useful within summarise() and mutate(), but it’s hard to use it with filter() because it is not clear how the results would be combined into one logical vector. So to fill the gap, we’re introducing two new functions if_all() and if_any().
Tags
Annotators
URL
-
- May 2022
-
stackoverflow.com stackoverflow.com
-
df %>% group_by(cat) %>% mutate(id = row_number())
numbering index within a group
-
- Oct 2020
-
tidyr.tidyverse.org tidyr.tidyverse.org
-
dplyr::coalesce() to replaces NAs with values from other vectors.
Tags
Annotators
URL
-
- Apr 2020
-
cran.r-project.org cran.r-project.org
-
Adding variable labels using pipe
-
- Mar 2020
-
-
dplyr in R also lets you use a different syntax for querying SQL databases like Postgres, MySQL and SQLite, which is also in a more logical order
-
- Feb 2020
-
www.r-bloggers.com www.r-bloggers.com
-
Now this can be simplified using the new {{}} syntax: summarise_groups <- function(dataframe, grouping_var, column_name){ dataframe %>% group_by({{grouping_var}}) %>% summarise({{column_name}} := mean({{column_name}}, na.rm = TRUE)) } Much easier and cleaner! You still have to use the := operator instead of = for the column name however. Also, from my understanding, if you want to modify the column names, for instance in this case return "mean_height" instead of height you have to keep using the enquo()–!! syntax.
curly curly syntax
-