i wondering if possible merge different rows on dataframe if have 1 field in common:
input:
df = rbind(c("01/01/2016",01:02:30,"100","character(0)","file a"), c("02/01/2016",9:02:30,"character(0)", 3, "file a"), c("02/01/2016",8:30:30,"200","character(0)","file b"), c("03/01/2016",8:25:30,"50","character(0)","file c"), c("04/01/2016",17:20:30,"character(0)","600","file b"))
output:
df = rbind(c(01/01/2016,01:02:30,"100",3,"file a"), c(02/01/2016,8:30:30,"200",600,"file b"), c(03/01/2016,8:25:30,"50","character(0)","file c"))
so yo can see merge rows according last value (file a, file b or file c). need keep earliest of dates. eg "file a" have 2 dates 01/01/2016 , 02/01/2016, want keep won't merge more 2 rows per value
we want keep earliest date
based on comments want find first instance (ordered 1 column) of non-missing value each column based on grouping column (in case "file a/b/c" column).
first have clean data bit. data load step buggy due misplaced quotation marks around timestamps. also, i'm assuming want represent missing values character(0)
values. if use na
s. here data initialisation , cleaning step:
# prepare data df = data.frame(v1 = c("01/01/2016 01:02:30","02/01/2016 9:02:30","02/01/2016 8:30:30", "03/01/2016 8:25:30","04/01/2016 17:20:30"), v2 = c("100","character(0)","200","50","character(0)"), v3 = c("character(0)", "3", "character(0)","character(0)", "600"), v4 = c("file a", "file a", "file b", "file c", "file b")) # replace character(0)s nas missing values df[df == "character(0)"] <- na # convert character dates time df$v1 <- strptime(as.character(df[ ,1]), format = "%d/%m/%y %h:%m:%s")
i've named columns v1..4
want more descriptive names. need you'll fill missing values of columns using zoo
package's na.locf()
function. eliminate cross contamination of data across different values of v4
column i'm looping through data. (there might better solution this...) here function custom row merging:
custom_row_merge <- function(df, sort_by, group_by){ # sort dates in decreasing order df <- df[order(df[,group_by], df[,sort_by]), ] # select columns merge columns_to_merge <- names(df)[!(names(df) %in% c(sort_by, group_by))] # fill data each unique value of group column (file_type in unique(df[, group_by])){ row_indices <- (df[,group_by] == file_type) # fill missing values each column not group or sort (column_name in columns_to_merge){ df[row_indices, column_name] <- na.locf(df[row_indices, column_name], na.rm = f, fromlast = t) } } # first occurence of each file, filled values return(df[!duplicated(df[, group_by]), ]) }
here original data frame:
> df v1 v2 v3 v4 1 2016-01-01 01:02:30 100 <na> file 2 2016-01-02 09:02:30 <na> 3 file 3 2016-01-02 08:30:30 200 <na> file b 4 2016-01-03 08:25:30 50 <na> file c 5 2016-01-04 17:20:30 <na> 600 file b
and 1 produced function, matching describe in question:
> custom_row_merge(df, "v1", "v4") v1 v2 v3 v4 1 2016-01-01 01:02:30 100 3 file 3 2016-01-02 08:30:30 200 600 file b 4 2016-01-03 08:25:30 50 <na> file c
you can of course fill missing values characer(0)
values if want to.
Comments
Post a Comment