且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何检测文本已被跟踪更改请求删除?

更新时间:2023-12-05 10:57:34

XWPFRun 尚不支持.但是我们可以确定是否有文本运行标记为已删除.

正常文本运行的 XML 看起来像:

<w:t>Lorem</w:t></w:r>

已删除的文本运行如下:

<w:r w:rsidDel="00C63AEB"><w:delText>ipsum</w:delText></w:r></w:del>

因此删除的运行位于 del 元素内.但这很难得到.

但是,虽然普通文本运行的文本位于 t 元素中,但它位于已删除文本运行的 delText 元素中.所以 XWPFRun.getText(0) 将返回 null 删除文本运行,因为这只会遍历运行的 t 元素.XWPFRun.text()XWPFRun.toString() 也将返回已删除运行的文本,因为这些方法会遍历运行中包含文本的所有元素.>

此外,已删除的文本运行在其 CTR 对象中具有 rsidDel 属性,而从未删除的运行中获取该属性将获得 null.>

此外,文本运行的 CTR 中的 getDelTextList 将为未删除的运行返回空列表,但为已删除的运行返回填充列表.

WordExample.docx 中检测已删除运行的示例.

import java.io.FileInputStream;导入 org.apache.poi.xwpf.usermodel.*;公共类 WordReadDeletedRuns {public static void main(String[] args) 抛出异常 {String inFilePath = "./WordExample.docx";XWPFDocument 文档 = new XWPFDocument(new FileInputStream(inFilePath));for (IBodyElement bodyElement : document.getBodyElements()) {if (bodyElement instanceof XWPFParagraph) {XWPFParagraph 段落 = (XWPFParagraph)bodyElement;for (IRunElement runElement : 段落.getIRuns()) {if (runElement instanceof XWPFRun) {XWPFRun 运行 = (XWPFRun)runElement;System.out.println("找到文本运行:" + run.text());System.out.println(run.getText(0));//删除的运行为空System.out.println(run.getCTR().getRsidDel());//null 表示未删除的运行,byte[] 表示已删除的运行System.out.println(run.getCTR().getDelTextList().size());//未删除运行的空列表,已删除运行的填充列表}}}}文档.close();}}

I am using apache-poi to read word files, and it is working.

I read the document text using a list of XWPFRun instances, and that is working fine.

But if track change is enabled for the document, I also get XWPFRun instances for text which have been deleted, if the delete have not been accepted. And I would like not to include this text.

So is there a way to detect track change status for a XWPDRun node, or even better a way to parse the document as if all track changes were accepted?

This is not yet supported by XWPFRun. But we could determine whether there are text runs marked as deleted.

Normal text run's XML looks like:

<w:r>
 <w:t>Lorem</w:t>
</w:r>

Deleted text runs look like:

<w:del w:id="0" w:author="axel" w:date="2020-04-23T18:57:00Z">
 <w:r w:rsidDel="00C63AEB">
  <w:delText>ipsum</w:delText>
 </w:r>
</w:del>

So deleted runs are within a del element. But this is tricky to get.

But while normal text run's text is in a t element, it is in a delText element for deleted text runs. So XWPFRun.getText(0) will return null for deleted text runs, because this only traverses the t elements of the run. XWPFRun.text() or XWPFRun.toString() will return the text of the deleted runs too, because those methods traverses all elements which contain text in the run.

Furthermore, deleted text runs have rsidDel attribute in it's CTR object while getting that attribute from not deleted runs will get null.

And furthermore, getDelTextList from CTR of a text run will return a empty list for not deleted runs, but will return a filled list for deleted runs.

Example to detect deleted runs from a WordExample.docx.

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

public class WordReadDeletedRuns {

 public static void main(String[] args) throws Exception {

  String inFilePath = "./WordExample.docx";

  XWPFDocument document = new XWPFDocument(new FileInputStream(inFilePath));
  for (IBodyElement bodyElement : document.getBodyElements()) {
   if (bodyElement instanceof XWPFParagraph) {
    XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
    for (IRunElement runElement : paragraph.getIRuns()) {
     if (runElement instanceof XWPFRun) {
      XWPFRun run = (XWPFRun)runElement;
      System.out.println("Text run found: " + run.text());
      System.out.println(run.getText(0)); // null for deleted runs
      System.out.println(run.getCTR().getRsidDel()); // null for not deleted runs, byte[] for deleted runs
      System.out.println(run.getCTR().getDelTextList().size()); // empty list for not deleted runs, filled list for deleted runs
     }
    }
   }
  }
  document.close();
 }
}