How can I use Regex to parse irregular CSV and not select certain characters -


i have handle weird csv format, , have been running problems. string have been able work out far is

(?:\s*(?:\"([^\"]*)\"|([^,]+))\s*?)+? 

my files broken , irregular, since have deal ocr'd text not checked our users. therefore, tend end lots of weird things, single " within field, or newline character(which why using regex instead of previous readline()-based solution). i've gotten parse correctly, except captures [,] [,]. how can not select fields single comma? when try , have not select commas, turns "156,000" [156] , [000]

the test string i've been using is

"156,000","",""i","parts","dog"","","monthly "running" totals" 

the ideal desire capture output is

[156,000],[],[i],[parts],[dog],[],[monthly "running" totals] 

i can or without internal quotes, since can strip them during processing.

thank time.

your csv indeed irregular , difficult parse. suggest 2 replacements first data.

// remove invalid double "" input = regex.replace(input, @"(?<!,|^)""(?=,|$)|(?<=,)""(?!,|$)", "\"");  // escape inner " input = regex.replace(input, @"(?<!,|^)"(?!,|$)", @"\\\"");  // @ stage have proper csv data , suggest using .net csv parser // parse data , individual values 

replacement 1 demo

replacement 2 demo


Comments