i've csv file content follows:
1,"hello, there",i have csv in which,"only when ""double quote"" or comma there in content",it wrapped in double quotes,otherwise not,something 1/2" not wrapped in double quotes.
i used opencsv , other csv libraries parsing didn't work.
i used regular expression quoted in stackoverflow question didn't work.
however, when open in excel working fine. can give me hint regarding how parse csv file.
note when content contains comma, wrapped in text qualifier. when such content wrapped in double quotes, , double quote part of content, escaped double quote. in other words, changes double double quote. if content has double quote, not wrapped in text qualifiers.
please advise regarding this.
the output of above content when parsed should below:
the output should follows:
1 hello, there have csv in whn "double quote" or comma there in content wrapped in double quotes otherwise not 1/2" not wrapped in double quotes.
i tried using open csv , tried split using regular expression:
",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
but of no use.
my data below:
product,,1/2" 18v cordless xrp li-lon drill/drive,p,2510906459,,dewalt tools,,,<br><img src="http://example.com/image.png"><br><br><p><b>unit of measure: ea<br><br> qty per unit of measure: 1<br><br> minimum order quantity: 1<br></p></b>dewalt tools dcd960kl - 1/2" 18v cordless xrp li-lon drill/driver kit - xrp™ cordless drills - best in class length improved balance , better control|led worklight provides increased visibility in confined spaces|patented 3-speed all-metal transmission matches tool task fastest application speed , improved - equal 115-dcd960kl,
want parsed below (i used represent empty cell when see in excel)
product <blank> 1/2" 18v cordless xrp li-lon drill/drive p 2510906459 <blank> dewalt tools <blank> <blank> <br><img src="http://example.com/image.png"><br><br><p><b>unit of measure: ea<br><br> qty per unit of measure: 1<br><br> minimum order quantity: 1<br></p></b>dewalt tools dcd960kl - 1/2" 18v cordless xrp li-lon drill/driver kit - xrp™ cordless drills - best in class length improved balance , better control|led worklight provides increased visibility in confined spaces|patented 3-speed all-metal transmission matches tool task fastest application speed , improved - equal 115-dcd960kl
i had no problems parsing input univocity-parsers:
string input = "product,,1/2\" 18v cordless xrp li-lon drill/drive,p,2510906459,,dewalt tools,,,<br><img src=\"http://example.com/image.png\"><br><br><p><b>unit of measure: ea<br><br> qty per unit of measure: 1<br><br> minimum order quantity: 1<br></p></b>dewalt tools dcd960kl - 1/2\" 18v cordless xrp li-lon drill/driver kit - xrp™ cordless drills - best in class length improved balance , better control|led worklight provides increased visibility in confined spaces|patented 3-speed all-metal transmission matches tool task fastest application speed , improved - equal 115-dcd960kl,"; reader reader = new stringreader(input); csvparsersettings settings = new csvparsersettings(); //many options here, check tutorial. settings.setnullvalue("<blank>"); //use obtain <blank> represent nulls string[] row = new csvparser(settings).parseall(reader).get(0); for(string element : row){ system.out.println(element); }
output:
product <blank> 1/2" 18v cordless xrp li-lon drill/drive p 2510906459 <blank> dewalt tools <blank> <blank> <br><img src="http://example.com/image.png"><br><br><p><b>unit of measure: ea<br><br> qty per unit of measure: 1<br><br> minimum order quantity: 1<br></p></b>dewalt tools dcd960kl - 1/2" 18v cordless xrp li-lon drill/driver kit - xrp™ cordless drills - best in class length improved balance , better control|led worklight provides increased visibility in confined spaces|patented 3-speed all-metal transmission matches tool task fastest application speed , improved - equal 115-dcd960kl <blank>
disclaimer: i'm author of library, it's open-source , free (apache 2.0 license)
Comments
Post a Comment