i have handle weird csv format, , have been running problems. string have been able work out far is
(?:\s*(?:\"([^\"]*)\"|([^,]+))\s*?)+?
my files broken , irregular, since have deal ocr'd text not checked our users. therefore, tend end lots of weird things, single " within field, or newline character(which why using regex instead of previous readline()-based solution). i've gotten parse correctly, except captures [,] [,]. how can not select fields single comma? when try , have not select commas, turns "156,000" [156] , [000]
the test string i've been using is
"156,000","",""i","parts","dog"","","monthly "running" totals"
the ideal desire capture output is
[156,000],[],[i],[parts],[dog],[],[monthly "running" totals]
i can or without internal quotes, since can strip them during processing.
thank time.
your csv indeed irregular , difficult parse. suggest 2 replacements first data.
// remove invalid double "" input = regex.replace(input, @"(?<!,|^)""(?=,|$)|(?<=,)""(?!,|$)", "\""); // escape inner " input = regex.replace(input, @"(?<!,|^)"(?!,|$)", @"\\\""); // @ stage have proper csv data , suggest using .net csv parser // parse data , individual values
Comments
Post a Comment