且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用 Apache POI 从 PowerPoint 幻灯片中删除图表

更新时间:2023-02-14 09:58:53

由于 XSLFChart 处于 @Beta 状态,直到现在还没有明确的 Shape 图表.所以使用 apache poi 我们只能得到包含图表的 XSLFGraphicFrames.但是从幻灯片中删除 XSLFGraphicFrame 也不会删除所有相关的图表部分.所以自上而下移除相关图表部分,意味着从POIXMLDocumentPart 级别向下到PackagePart 级别,直到现在还没有实现.而且由于 POIXMLDocumentPart 中的所有相关方法都受到保护,并且 XSLFChart 本身是最终的,因此解决这个问题并不容易.

Since XSLFChart is in @Beta state, there is not a explicit Shape for a chart until now. So using apache poi we can only get XSLFGraphicFrames which are containing charts. But removing a XSLFGraphicFrame from the slide will not removing all related chart parts too. So the removing the related chart parts top-down, means from POIXMLDocumentPart level down to PackagePart level is not implemented until now. And since all relevant methods in POIXMLDocumentPart are protected and the XSLFChart itself is final there is not really a easy possibility to work around.

以下代码显示了问题.它是这样评论的.

The folowing code shows the problem. It is commented as such.

代码从第一张幻灯片中删除所有图表并删除所有关系和相关部分:/ppt/embeddings/Microsoft_Excel_WorksheetN.xlsx, /ppt/charts/colorsN.xml/ppt/charts/styleN.xml.只有 /ppt/charts/chartN.xml 不能删除,因为它被注释了.

The code removes all charts from the first slide and removes all relationships and related parts that would be: /ppt/embeddings/Microsoft_Excel_WorksheetN.xlsx, /ppt/charts/colorsN.xml and /ppt/charts/styleN.xml. Only /ppt/charts/chartN.xml cannot be removed, as it is commented.

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.xslf.usermodel.*;
import org.apache.poi.sl.usermodel.*;

import org.apache.poi.POIXMLDocumentPart;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackageRelationshipCollection;
import org.apache.poi.openxml4j.opc.PackageRelationship;

import org.apache.xmlbeans.XmlObject;

import java.util.Map;
import java.util.HashMap;

import java.util.regex.Pattern;

public class ReadPPTRemoveChart {

 public static void main(String[] args) throws Exception {

  XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("PPTWithCharts.pptx"));

  XSLFSlide slide = slideShow.getSlides().get(0);

  Map<String, XSLFGraphicFrame> chartFramesToRemove = new HashMap<>();

  for (XSLFShape shape : slide.getShapes()) {
   if (shape instanceof XSLFGraphicFrame) {
    XSLFGraphicFrame graphicframe = (XSLFGraphicFrame)shape;
    XmlObject xmlobject = graphicframe.getXmlObject();
    XmlObject[] graphics = xmlobject.selectPath(
                            "declare namespace a='http://schemas.openxmlformats.org/drawingml/2006/main' " +
                            ".//a:graphic");
    if (graphics.length > 0) { //we have a XSLFGraphicFrame containing a:graphic
     XmlObject graphic = graphics[0];
     XmlObject[] charts = graphic.selectPath(
                           "declare namespace c='http://schemas.openxmlformats.org/drawingml/2006/chart' " +
                           ".//c:chart");
     if (charts.length > 0) { //we have a XSLFGraphicFrame containing c:chart
      XmlObject chart = charts[0];
      String rid = chart.selectAttribute(
                          "http://schemas.openxmlformats.org/officeDocument/2006/relationships", "id")
                          .newCursor().getTextValue();
      chartFramesToRemove.put(rid, graphicframe);
     }
    }
   }
  }

  PackagePart slidepart = slide.getPackagePart();
  OPCPackage opcpackage = slideShow.getPackage();

  for (String rid : chartFramesToRemove.keySet()) {
   //at frist remove the XSLFGraphicFrame
   XSLFGraphicFrame chartFrame = chartFramesToRemove.get(rid);
   slide.removeShape(chartFrame);
   //Here is the problem in my opinion. This **should** remove all related parts too.
   //But since XSLFChart is @Beta, it does not.

   //So we try doing removing the related parts manually.
   //we get the PackagePart of the chart
   PackageRelationship relship = slidepart.getRelationships().getRelationshipByID(rid);
   PackagePart chartpart = slidepart.getRelatedPart(relship);

   //now we get and remove all the relations and related PackageParts from this chartpart
   //this are /ppt/embeddings/Microsoft_Excel_WorksheetN.xlsx, /ppt/charts/colorsN.xml 
   //and /ppt/charts/styleN.xml
   for (PackageRelationship chartrelship : chartpart.getRelationships()) {
    String partname = chartrelship.getTargetURI().toString();
    PackagePart part = opcpackage.getPartsByName(Pattern.compile(partname)).get(0);
    opcpackage.removePart(part);
    chartpart.removeRelationship(chartrelship.getId());
   }
   //this works

   //now we **should** be able removing the relationship to the chartpart from the slide too
   //but this seems not to be possible
   //doing this on PackagePart level works:
   slidepart.removeRelationship(rid);
   for (PackageRelationship sliderelship : slidepart.getRelationships()) {
    System.out.println("rel PP level: " + sliderelship.getTargetURI().toString());
   }
   //all relationships to /ppt/charts/chartN.xml are removed

   //but on POIXMLDocumentPart level this has no effect
   for (POIXMLDocumentPart sliderelpart : slide.getRelations()) {
    System.out.println("rel POIXML level: " + sliderelpart.getPackagePart().getPartName());
   }
   //relationships to /ppt/charts/chartN.xml are **not** removed

   //So we cannot remove the chartpart.
   //If we would do this, then while slideShow.write the 
   //org.apache.poi.xslf.usermodel.XSLFChart.commit in XSLFChart.java fails 
   //because after removing the PackagePart is absent but the relation is still there.
   //opcpackage.removePart(chartpart);

  }


  slideShow.write(new FileOutputStream("PPTWithChartsNew.pptx"));
  slideShow.close();
 }
}

使用PowerPoint打开PPTWithChartsNew.pptx并保存后,不需要的/ppt/charts/styleN.xml部分被删除也因为与他们没有更多的关系.

After opening the PPTWithChartsNew.pptx using PowerPoint and saving it then, the unnecessary /ppt/charts/styleN.xml parts are removed too since there are no more relations to them.

编辑 2017 年 9 月 24 日:

Edit Sep 24 2017:

使用反射找到了解决方案.如前所述,去除相关图表部分需要自上而下,即从POIXMLDocumentPart 级别向下到PackagePart 级别.由于 POIXMLDocumentPart.removeRelation 是受保护的,我们需要使用反射来做到这一点.

Found a solution using reflection. As said, the removing the related chart parts needs to be top-down, means from POIXMLDocumentPart level down to PackagePart level. And since POIXMLDocumentPart.removeRelation is protected, we need doing this using reflection.

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.xslf.usermodel.*;
import org.apache.poi.sl.usermodel.*;

import org.apache.poi.POIXMLDocumentPart;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackageRelationshipCollection;
import org.apache.poi.openxml4j.opc.PackageRelationship;

import org.apache.xmlbeans.XmlObject;

import java.util.Map;
import java.util.HashMap;

import java.util.regex.Pattern;

import java.lang.reflect.Method;

public class ReadPPTRemoveChart {

 public static void main(String[] args) throws Exception {

  XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("PPTWithCharts.pptx"));

  XSLFSlide slide = slideShow.getSlides().get(0);

  Map<String, XSLFGraphicFrame> chartFramesToRemove = new HashMap<>();

  for (XSLFShape shape : slide.getShapes()) {
   if (shape instanceof XSLFGraphicFrame) {
    XSLFGraphicFrame graphicframe = (XSLFGraphicFrame)shape;
    XmlObject xmlobject = graphicframe.getXmlObject();
    XmlObject[] graphics = xmlobject.selectPath(
                            "declare namespace a='http://schemas.openxmlformats.org/drawingml/2006/main' " +
                            ".//a:graphic");
    if (graphics.length > 0) { //we have a XSLFGraphicFrame containing a:graphic
     XmlObject graphic = graphics[0];
     XmlObject[] charts = graphic.selectPath(
                           "declare namespace c='http://schemas.openxmlformats.org/drawingml/2006/chart' " +
                           ".//c:chart");
     if (charts.length > 0) { //we have a XSLFGraphicFrame containing c:chart
      XmlObject chart = charts[0];
      String rid = chart.selectAttribute(
                          "http://schemas.openxmlformats.org/officeDocument/2006/relationships", "id")
                          .newCursor().getTextValue();
      chartFramesToRemove.put(rid, graphicframe);
     }
    }
   }
  }

  PackagePart slidepart = slide.getPackagePart();
  OPCPackage opcpackage = slideShow.getPackage();

  for (String rid : chartFramesToRemove.keySet()) {
   //at frist remove the XSLFGraphicFrame
   XSLFGraphicFrame chartFrame = chartFramesToRemove.get(rid);
   slide.removeShape(chartFrame);
   //Here is the problem in my opinion. This **should** remove all related parts too.
   //But since XSLFChart is @Beta, it does not.

   //So we try doing removing the related parts manually.

   //we get the PackagePart of the chart
   PackageRelationship relship = slidepart.getRelationships().getRelationshipByID(rid);
   PackagePart chartpart = slidepart.getRelatedPart(relship);

   //now we get and remove all the relations and related PackageParts from this chartpart
   //this are /ppt/embeddings/Microsoft_Excel_WorksheetN.xlsx, /ppt/charts/colorsN.xml 
   //and /ppt/charts/styleN.xml
   for (PackageRelationship chartrelship : chartpart.getRelationships()) {
    String partname = chartrelship.getTargetURI().toString();
    PackagePart part = opcpackage.getPartsByName(Pattern.compile(partname)).get(0);
    opcpackage.removePart(part);
    chartpart.removeRelationship(chartrelship.getId());
   }

   //now we remove the chart part from the slide part
   //We need doing this on POIXMLDocumentPart level. 
   //Since POIXMLDocumentPart.removeRelation is protected, we need doing this using reflection
   XSLFChart chart = (XSLFChart)slide.getRelationById(rid);
   Method removeRelation = POIXMLDocumentPart.class.getDeclaredMethod("removeRelation", POIXMLDocumentPart.class); 
   removeRelation.setAccessible(true); 
   removeRelation.invoke(slide, chart);

  }

  slideShow.write(new FileOutputStream("PPTWithChartsNew.pptx"));
  slideShow.close();
 }
}