ggplot2 - How to group factors with low frequence into an 'other' factor in R -


# generate counts table library(plyr) example <- data.frame(count(diamonds,c('color', 'cut'))) example[1:3,]  # excerpt of table        color  cut   freq 1      d      fair  163 2      d       662 3      d 1513 

you can filter table freq > 1000 with: example[example$freq > 1000,]. generate table similar except values less value e.g. 1000 included in row (other) similar happens when have many factors , call summary(example, maxsum=3).

     color         cut          freq       d      : 5   fair   : 7   min.   : 119    e      : 5     : 7   1st qu.: 592    (other):25   (other):21   median :1204                              mean   :1541                              3rd qu.:2334                              max.   :4884  

example ideal output:

ideally want convert example[example$color=='j',]:

 color   cut freq  j      fair  119  j       307  j  678  j   premium  808  j     ideal  896 

and produce this:

 color       cut freq      j  678      j   premium  808      j     ideal  896      j   (other)  426  

bonus: if kind of filtering possible ggplot create plot below, filtering, great also.

ggplot(example, aes(x=color, y=freq)) + geom_bar(aes(fill=cut), stat = "identity") 

enter image description here

here alternative using dplyr pipe correct data directly ggplot call.

library(dplyr) example %>% mutate(cut = ifelse(freq < 500, "other", levels(cut))) %>%   group_by(color, cut) %>%   summarise(freq = sum(freq)) %>%   ggplot(aes(color, freq, fill = cut)) +   geom_bar(stat = "identity") 

enter image description here

be sure detach plyr, otherwise output incorrect dplyr call.


Comments