excel - Duplicate Removal using an array in vba -


this question has answer here:

hi have used code answer question 'how delete duplicates between 2 excel sheets vba' , tried alter code suite own vba script. code delete rows same amount in array deleting first 11 rows. new vba , not understanding why doing this. below copy of script using.

    dim overlaywb workbook       'overlay_workbook     dim formattedwb workbook     'formatted_workbook     dim formattedws worksheet    'current active worksheet (formatted)     dim overlayws worksheet      'worksheet in overlay     dim lastrowformatted long     dim lastrowoverlay long      dim targetarray, searcharray     dim targetrange range     dim x long      'update these 4 lines if target , search ranges change     dim targetsheetname string: targetsheetname = "formatted"     dim targetsheetcolumn string: targetsheetcolumn = "g22"     dim searchsheetname string: searchsheetname = "overlay"     dim searchsheetcolumn string: searchsheetcolumn = "g22"        'open overlay workbook     set overlaywb = workbooks.open("c:\documents\templates\overlaye.xls") 'path workbook overlay copy     set formattedws = workbooks("formatted").sheets("dlt formatted")     set overlayws = workbooks("overlay").sheets("overlay")     set formattedwb = thisworkbook   'load target array     formattedws         set targetrange = .range(.range(targetsheetcolumn & "7"), _                 .range(targetsheetcolumn & rows.count).end(xlup))         targetarray = targetrange     end   'load search array     overlayws         searcharray = .range(.range(searchsheetcolumn & "7"), _                 .range(searchsheetcolumn & rows.count).end(xlup))     end       dim dict object     set dict = createobject("scripting.dictionary")     'populate dictionary search array     if isarray(searcharray)         x = 1 ubound(searcharray)             if not dict.exists(searcharray(x, 1))                 dict.add searcharray(x, 1), 1             end if         next     else         if not dict.exists(searcharray)             dict.add searcharray, 1         end if     end if      'delete rows values found in dictionary     if isarray(targetarray)         'step backwards avoid deleting wrong rows.         x = ubound(targetarray) 1 step -1             if dict.exists(targetarray(x, 1))                 targetrange.cells(x).entirerow.delete             end if         next     else         if dict.exists(targetarray)             targetrange.entirerow.delete         end if     end if 

can me appreicated, have not altered scripting correctly, or missing something?

it seems have become received wisdom on site task of deleting rows best achieved looping through range bottom top , deleting each individual row whenever criteria met. yet quite inefficient method. compare these 2 snippets, example:

dim r long dim clock ctimer  set clock = new ctimer  clock.startcounter application.screenupdating = false r = 1 10000     sheet1.cells(1, 1).entirerow.delete next application.screenupdating = true debug.print "row row:"; clock.timeelapsed; "ms"  clock.startcounter application.screenupdating = false sheet1.range("a1:a10000").entirerow.delete application.screenupdating = true debug.print "range:"; clock.timeelapsed; "ms" 

output follows:

row row: 2876.18174935641 ms

range: 15.2153416146466 ms

these results aren't surprising it's fair generalise greater number of individual interactions worksheet, slower programme be.

what's shame of posts removing duplicates go great lengths read worksheet values , reference items arrays in order avoid excessive sheet interactions. , yet of efficiency gains lost inefficient row deletion. what's misleading these posts purport "quick".

some might argue want carry out tasks on worksheet in between row deletions. however, vba ranges update addresses in same way excel formula range does. have @ code below example of this:

dim cell range  set cell = sheet1.range("a3") debug.print "address before deletion:"; cell.address sheet1.range("a1").entirerow.delete debug.print "address after deletion:"; cell.address 

output is:

address before deletion:$a$3

address after deletion:$a$2

so following code still delete cells "a4" , "a6" , original cells "a8" , "a10", example:

dim rng1 range dim rng2 range  set rng1 = sheet1.range("a4, a6") set rng2 = sheet1.range("a8, a10") rng1.entirerow.delete sheet1.range("a5").insert xldown rng2.entirerow.delete 

for practical application, op genuinely answer question of 'how delete duplicates between 2 excel sheets vba'? following code:

private sub removematchingrowsasbatch(refrange range, targetrange range)     dim refvalues variant     dim refitems collection     dim refindex long     dim refkey string     dim targetvalues variant     dim targetindex long     dim targetkey string     dim test variant     dim delrows range     dim added boolean      'read datasets arrays     refvalues = refrange.value2     targetvalues = targetrange.value2      'loop through target values , check if items match     set refitems = new collection     targetindex = 1 ubound(targetvalues, 1)         if not isempty(targetvalues(targetindex, 1))             targetkey = cstr(targetvalues(targetindex, 1))             test = empty: on error resume next             test = refitems(targetkey): on error goto 0              'check if existing ref item list has match             if not isempty(test)                 targetrange.cells(targetindex, 1).entirerow.delete                 if delrows nothing                     set delrows = targetrange.cells(targetindex, 1)                 else                     set delrows = union(delrows, targetrange.cells(targetindex, 1))                 end if             else                 'there no match continue reading reference list.                 while refindex < ubound(refvalues, 1)                     refindex = refindex + 1                     if not isempty(refvalues(refindex, 1))                         'test new reference item isn't duplicate.                         refkey = cstr(refvalues(refindex, 1))                         on error resume next                         refitems.add refkey, refkey                         added = err.number = 0                         on error goto 0                         'it isn't duplicate check match.                         if added                             if refkey = targetkey                                 if delrows nothing                                     set delrows = targetrange.cells(targetindex, 1)                                 else                                     set delrows = union(delrows, targetrange.cells(targetindex, 1))                                 end if                                 exit                             end if                         end if                     end if                 loop               end if         end if     next      'now delete rows in 1 'batch'.     if not delrows nothing         delrows.entirerow.delete     end if  end sub 

in actual fact, there misunderstandings role , function of variables in op's code, , other respondents have pointed out. however, in interest of completeness, correct reading routine his/her 2 worksheets might below:

public sub readsheets()     dim reffilepath string     dim refbookname string     dim refbook workbook     dim refsheet worksheet     dim refsheetname string     dim refcol string     dim refrow long     dim refrange range     dim refvalues variant     dim targetbook workbook     dim targetsheet worksheet     dim targetsheetname string     dim targetcol string     dim targetrow long     dim targetrange range     dim targetvalues variant      'define sheet variables.     reffilepath = "z:\ambie\vba"     refbookname = "reference.xlsx"     refsheetname = "data"     refcol = "a"     refrow = "2"     targetsheetname = "uniques"     targetcol = "b"     targetrow = "3"      'define excel sheet objects.     on error resume next     set refbook = workbooks(refbookname)     on error goto 0     if refbook nothing         set refbook = workbooks.open(reffilepath & "\" & refbookname)     end if     set refsheet = refbook.worksheets(refsheetname)     set targetbook = thisworkbook     set targetsheet = targetbook.worksheets(targetsheetname)      'read both datasets.     refsheet         set refrange = .range(.cells(refrow, refcol), _                               .cells(.rows.count, refcol).end(xlup))     end      targetsheet         set targetrange = .range(.cells(targetrow, targetcol), _                                  .cells(.rows.count, targetcol).end(xlup))     end      'call removal routine here     removematchingrowsasbatch refrange, targetrange end sub 

Comments