Performance of PowerShell script reading files is too slow -

i'm working on powershell script going used in teamcity part of build step. script has to:

recursively check files extension (.item) within folder,
read third line of each file (which contains guid) , check if there duplicates in these lines,
log path of file contains duplicate guid , log guid itself,
make teamcity build fail if 1 or more duplicates found

i new powershell scripts, far i've made expect do:

write-host "start checking unicorn serialization errors."  $files = get-childitem "%system.teamcity.build.workingdir%\sitecore\serialization" -recurse -include *.item | {! $_.psiscontainer} | % { $_.fullname } $arrayofitemids = @() $nroffiles = $files.length [bool] $foundduplicates = 0  write-host "there $nroffiles unicorn item files check."  foreach ($file in $files) {     $thirdlineoffile = (get-content $file)[2 .. 2]      if ($arrayofitemids -contains $thirdlineoffile)     {         $foundduplicates = 1         $itemid = $thirdlineoffile.split(":")[1].trim()          write-host "duplicate item id found!"         write-host "item file path: $file"         write-host "detected duplicate id: $itemid"         write-host "-------------"         write-host ""     }     else     {         $arrayofitemids += $thirdlineoffile     } }  if ($foundduplicates) {     "##teamcity[buildstatus status='failure' text='one or more duplicate id's detected in sitecore serialised items. check build log see files , id's involved.']"     exit 1 }  write-host "end script checking unicorn serialization errors."

the problem is: it's slow! folder has checked script contains on 14.000 .item-files , it's that amount keep increasing in future. understand opening , reading many files extensive operation, didn't expect take approximately half hour complete. way long, because mean build time every (snapshot) build lengthened half hour, unacceptable. had hoped script complete in couple of minutes @ max.

i can't possibly believe there isn't faster approach this.. in area appreciated!

solution

well have 3 answers received far have helped me out in one. first started using .net framework classes directly , used dictionary solve growing array problem. time took run own script 30 minutes, went down 2 minutes using .net framework classes. after using dictionary solution went down 6 or 7 seconds! final script use:

write-host "start checking unicorn serialization errors."  [string[]] $allfilepaths = [system.io.directory]::getfiles("%system.teamcity.build.workingdir%\sitecore\serialization", "*.item", "alldirectories") $idsprocessed = new-object 'system.collections.generic.dictionary[string,string]' [bool] $foundduplicates = 0 $nroffiles = $allfilepaths.length  write-host "there $nroffiles unicorn item files check." write-host ""  foreach ($filepath in $allfilepaths) {     [system.io.streamreader] $sr = [system.io.file]::opentext($filepath)     $unused1 = $sr.readline() #read first unused line     $unused2 = $sr.readline() #read second unused line     [string]$thirdlineoffile = $sr.readline()     $sr.close()      if ($idsprocessed.containskey($thirdlineoffile))     {         $foundduplicates = 1         $itemid = $thirdlineoffile.split(":")[1].trim()         $otherfilewithsameid = $idsprocessed[$thirdlineoffile]          write-host "---------------"         write-host "duplicate item id found!"         write-host "detected duplicate id: $itemid"         write-host "item file path 1: $filepath"         write-host "item file path 2: $otherfilewithsameid"         write-host "---------------"         write-host ""     }     else     {         $idsprocessed.add($thirdlineoffile, $filepath)     } }  if ($foundduplicates) {     "##teamcity[buildstatus status='failure' text='one or more duplicate id|'s detected in sitecore serialised items. check build log see files , id|'s involved.']"     exit 1 }  write-host "end script checking unicorn serialization errors. no duplicate id's found."

so all!

it isn't clear powershell when use high level commands get-childitem , get-content. more explicit , use .net framework classes directly.

get paths of files in folder using

[string[]] $files = [system.io.directory]::getfiles($folderpath, "*.yourext")

then, rather using get-content, open each file , read first 3 lines. so:

[system.io.streamreader] $sr = [system.io.file]::opentext(path) [string]$line = $sr.readline() while ($line -ne $null) {   # thing, break when know enough   # ...   [string]$line = $sr.readline() } $sr.close()

i may have made mistake or two, lazy , test on pc.

and may want consider redesigning build system use less files. 14000 files , growing seems unnecessary. if can consolidate data in less files, may performance lot.

for check duplicate guids, use dictionary<guid, string> class string being file name. can report duplicates if find any.

Club Open

Search This Blog

Performance of PowerShell script reading files is too slow -

Comments

Post a Comment