i developing software, reading in word document (xwpf), changing grammar according prior configuration of user, , writing modified text document.
to achieve this, using apache poi (3.10). in cases works expected, there rare cases, doesn't.
to more specific, code going through whole document, paragraph paragraph. code changing content of paragraphs setting text of runs.
to have better picture of talking about:
xwpfdocument olddoc = document; iterator<xwpfparagraph> iterator = document.getparagraphsiterator(); int length = title.length(); int counter = 0; while(iterator.hasnext()) { xwpfparagraph paragraph = iterator.next(); list<xwpfrun> runs = paragraph.getruns(); for(int = 0; < runs.size(); i++) { string text = runs.get(i).tostring(); if(text.contains(title)) { runs.get(i).settext(stringfunctions.frommultifemaletosinglemale(text, title, length), 0); } } document.setparagraph(paragraph, counter); counter++; }
as can see, take every run of paragraph, , throw transformation method, , overwrite text of run. there no problem @ point (i think).
my problem is, there 2 or 3 sentences (or paragraphs), not return whole text in runs. below example.
paragraph.gettext() returns: alle beteiligten weisen daher den notar gem. § 53 beurkg an, die umschreibung gemäß dieser vollmacht durch eigenurkunde erst zu veranlassen, nachdem der verkäufer den eingang des geschuldeten betrages originalschriftlich bestätigt haben oder hilfsweise die käuferinnen die zahlung des vereinbarten kaufpreises (jeweils ohne zinsen) durch bankbestätigung nachgewiesen hat.
while concatenation of paragraph.getruns() returns: alle beteiligten weisen daher den notar gem. § 53 beurkg an, die umschreibung gemäß dieser vollmacht durch eigenurkunde erst zu veranlassen, nachdem die verkäufer den eingang des geschuldeten betrages originalschriftlich bestätigt haben oder hilfsweise die käuferinnen die zahlung des vereinbarten kaufpreises (jeweils ohne zinsen) durch
you can see here, last 3 words missing. since accessing runs, last part ignored , never transformed method. in case need transform last word make grammatically correct.
has experienced similar?
i tried find workaround, didn't find one. 1 of coworkers talked me , mentioned experienced similar, , there flaw in xml-like structure of word-documents, word can still work with, poi not well. not sound unlikely, since document, using, once old word format (.doc), , saved .docx word 2007.
Comments
Post a Comment