r - Merge rows from same dataframe -

i wondering if possible merge different rows on dataframe if have 1 field in common:

input:

df = rbind(c("01/01/2016",01:02:30,"100","character(0)","file a"),            c("02/01/2016",9:02:30,"character(0)", 3, "file a"),            c("02/01/2016",8:30:30,"200","character(0)","file b"),            c("03/01/2016",8:25:30,"50","character(0)","file c"),            c("04/01/2016",17:20:30,"character(0)","600","file b"))

output:

df = rbind(c(01/01/2016,01:02:30,"100",3,"file a"),            c(02/01/2016,8:30:30,"200",600,"file b"),            c(03/01/2016,8:25:30,"50","character(0)","file c"))

so yo can see merge rows according last value (file a, file b or file c). need keep earliest of dates. eg "file a" have 2 dates 01/01/2016 , 02/01/2016, want keep won't merge more 2 rows per value

we want keep earliest date

based on comments want find first instance (ordered 1 column) of non-missing value each column based on grouping column (in case "file a/b/c" column).

first have clean data bit. data load step buggy due misplaced quotation marks around timestamps. also, i'm assuming want represent missing values character(0) values. if use nas. here data initialisation , cleaning step:

# prepare data df = data.frame(v1 = c("01/01/2016 01:02:30","02/01/2016 9:02:30","02/01/2016 8:30:30",                        "03/01/2016 8:25:30","04/01/2016 17:20:30"),                 v2 = c("100","character(0)","200","50","character(0)"),                 v3 = c("character(0)", "3", "character(0)","character(0)", "600"),                 v4 = c("file a", "file a", "file b", "file c", "file b"))  # replace character(0)s nas missing values df[df == "character(0)"] <- na  # convert character dates time df$v1 <- strptime(as.character(df[ ,1]), format = "%d/%m/%y %h:%m:%s")

i've named columns v1..4 want more descriptive names. need you'll fill missing values of columns using zoo package's na.locf() function. eliminate cross contamination of data across different values of v4 column i'm looping through data. (there might better solution this...) here function custom row merging:

custom_row_merge <- function(df,                              sort_by,                              group_by){      # sort dates in decreasing order     df <- df[order(df[,group_by], df[,sort_by]), ]      # select columns merge     columns_to_merge <- names(df)[!(names(df) %in% c(sort_by, group_by))]      # fill data each unique value of group column     (file_type in unique(df[, group_by])){          row_indices <- (df[,group_by] == file_type)          # fill missing values each column not group or sort         (column_name in columns_to_merge){              df[row_indices, column_name] <- na.locf(df[row_indices, column_name],                                                     na.rm = f,                                                     fromlast = t)         }          }      # first occurence of each file, filled values     return(df[!duplicated(df[, group_by]), ])  }

here original data frame:

> df                    v1   v2   v3     v4 1 2016-01-01 01:02:30  100 <na> file 2 2016-01-02 09:02:30 <na>    3 file 3 2016-01-02 08:30:30  200 <na> file b 4 2016-01-03 08:25:30   50 <na> file c 5 2016-01-04 17:20:30 <na>  600 file b

and 1 produced function, matching describe in question:

> custom_row_merge(df, "v1", "v4")                    v1  v2   v3     v4 1 2016-01-01 01:02:30 100    3 file 3 2016-01-02 08:30:30 200  600 file b 4 2016-01-03 08:25:30  50 <na> file c

you can of course fill missing values characer(0) values if want to.

Club Open

Search This Blog

r - Merge rows from same dataframe -

Comments

Post a Comment