如何使用字体信息创建pdf并在将它们合并为单个pdf时嵌入实际字体

更新时间：2023-02-12 22:12:04

我创建了 MergeAndAddFont 示例解释了不同的选项。

我们将创建PDF使用此代码段：

  public void createPdf（String filename，String text，boolean embedded，boolean subset）抛出DocumentEx ception，IOException {
 //步骤1 
文档文档=新文档（）; 
 //步骤2 
 PdfWriter.getInstance（document，new FileOutputStream（filename））; 
 //步骤3 
 document.open（）; 
 //步骤4 
 BaseFont bf = BaseFont.createFont（FONT，BaseFont.WINANSI，embedded）; 
 bf.setSubset（subset）; 
 Font font = new Font（bf，12）; 
 document.add（new Paragraph（text，font））; 
 //步骤5 
 document.close（）; 
}

我们使用此代码创建3个测试文件，1,2,3和我们会这样做3次：A，B，C。

第一次，我们使用参数 embedded = true 和 subset = true ，导致文件 testA1.pdf ，文字abcdefgh（3.71 KB）， testA2.pdf ，文本ijklmnopq（3.49 KB）和 testA3.pdf ，文字rstuvwxyz（3.55 KB）。字体是嵌入的，文件大小相对较低，因为我们只嵌入了字体的子集。

现在我们使用以下代码合并这些文件，使用 smart 参数，用于指示我们是否要使用 PdfCopy 或 PdfSmartCopy ：

  public void mergeFiles（String [] files，String result，boolean smart）抛出IOException，DocumentException {
 Document document =新文件（）; 
 PdfCopy副本; 
 if（smart）
 copy = new PdfSmartCopy（document，new FileOutputStream（result））; 
 else 
 copy = new PdfCopy（document，new FileOutputStream（result））; 
 document.open（）; 
 PdfReader [] reader = new PdfReader [3]; 
 for（int i = 0; i< files.length; i ++）{
 reader [i] = new PdfReader（files [i]）; 
 copy.addDocument（reader [i]）; 
} 
 document.close（）; 
 for（int i = 0; i< reader.length; i ++）{
 reader [i] .close（）; 
} 
}

当我们合并文档时，请使用 PdfCopy 或 PdfSmartCopy ，相同字体的不同子集将被复制为生成的PDF testA_merged1.pdf / testA_merged2.pdf （均为9.75 KB）。

这是您遇到的问题： PdfSmartCopy 可以检测和重用相同的对象，但相同字体的不同子集不相同，iText不能将同一字体的不同子集合并为一种字体。

第二次，我们使用参数 embedded = true 和子集= false ，导致文件 testB1.pdf （21.38 KB）， testB2.pdf （21.38 KB）和 testA3.pdf （21.38 KB）。字体是完全嵌入的，单个文件的文件大小比以前大很多，因为嵌入了完整的字体。

如果我们使用 PdfCopy ，字体将以冗余方式出现在合并文档中，从而导致文件膨胀 testB_merged1.pdf （63.16 KB）。这绝对不是你想要的！

但是，如果我们使用 PdfSmartCopy ，iText会检测到相同的字体流和重复使用它，导致 testB_merged2.pdf （21.95 KB），远小于我们有 PdfCopy 。它仍然比带有子集化字体的文档更大，但是如果你连接大量文件，如果你嵌入完整的字体，结果会更好。

第三次，我们使用参数 embedded = false 和 subset = false ，结果在文件 testC1.pdf （2.04 KB）中， testC2.pdf （2.04 KB）和 testC3.pdf （2.04 KB）。字体未嵌入，因此文件大小非常出色，但如果与之前的结果进行比较，您会发现字体看起来完全不同。

我们使用 PdfSmartCopy 合并文件，得到 testC_merged1.pdf （2.6 KB）。同样，我们有一个很好的文件大小，但我们再次遇到字体无法正确显示的问题。

要解决此问题，我们需要嵌入字体：

  private void embedFont（String merged，String fontfile，String result）抛出IOException，DocumentException {
 //字体file 
 RandomAccessFile raf = new RandomAccessFile（fontfile，r）; 
 byte fontbytes [] = new byte [（int）raf.length（）]; 
 raf.readFully（fontbytes）; 
 raf.close（）; 
 //为字体文件创建一个新流
 PdfStream stream = new PdfStream（fontbytes）; 
 stream.flateCompress（）; 
 stream.put（PdfName.LENGTH1，new PdfNumber（fontbytes.length））; 
 //创建一个读者对象
 PdfReader reader = new PdfReader（merged）; 
 int n = reader.getXrefSize（）; 
 PdfObject对象; 
 PdfDictionary字体; 
 PdfStamper stamper = new PdfStamper（reader，new FileOutputStream（result））; 
 PdfName fontname = new PdfName（BaseFont.createFont（fontfile，BaseFont.WINANSI，BaseFont.NOT_EMBEDDED）.getPostscriptFontName（））; 
 for（int i = 0; i< n; i ++）{
 object = reader.getPdfObject（i）; 
 if（object == null ||！object.isDictionary（））
 continue; 
 font =（PdfDictionary）对象; 
 if（PdfName.FONTDESCRIPTOR.equals（font.get（PdfName.TYPE））
&& fontname.equals（font.get（PdfName.FONTNAME）））{
 PdfIndirectObject objref = stamper.getWriter（）。addToBody（stream）; 
 font.put（PdfName.FONTFILE2，objref.getIndirectReference（））; 
} 
} 
 stamper.close（）; 
 reader.close（）; 
}

现在，我们有文件 testC_merged2.pdf （22.03 KB），这实际上就是你问题的答案。正如您所看到的，第二个选项优于第三个选项。

警告：此示例使用Gravitas One字体作为简单的字体。只要您将字体用作复合字体（您可以通过选择编码 IDENTITY-H 告诉iText将其用作复合字体，或者 IDENTITY-V ），您无法再选择是否嵌入字体，无论是否对字体进行子集化。根据ISO-32000-1的定义，iText将始终嵌入复合字体并始终将它们分组。

这意味着您在需要时无法使用上述解决方案特殊字体（中文，日文，韩文）。在这种情况下，您不应该嵌入字体，而是使用所谓的CJK字体。他们的CJK字体将使用可以通过Adobe Reader下载的字体包。

I create pdfs and concatenate them into a single pdf.

My resulting pdf is a lot bigger than I had expected in file size.

I realised that my pdf has a ton of duplicate font, and it is the reasone of unexpectedly big pdf.

Here, my qustion is:

I would like to create pdfs which only embed font information, so let they use Windows System Font.

When I merge them into a single pdf, I insert actual font which pdf needs.

If possible, please let me know how to do it.

Thanks.

I've created the MergeAndAddFont example to explain the different options.

We'll create PDFs using this code snippet:

public void createPdf(String filename, String text, boolean embedded, boolean subset) throws DocumentException, IOException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter.getInstance(document, new FileOutputStream(filename));
    // step 3
    document.open();
    // step 4
    BaseFont bf = BaseFont.createFont(FONT, BaseFont.WINANSI, embedded);
    bf.setSubset(subset);
    Font font = new Font(bf, 12);
    document.add(new Paragraph(text, font));
    // step 5
    document.close();
}

We use this code to create 3 test files, 1, 2, 3 and we'll do this 3 times: A, B, C.

The first time, we use the parameters embedded = true and subset = true, resulting in the files testA1.pdf with text "abcdefgh" (3.71 KB), testA2.pdf with text "ijklmnopq" (3.49 KB) and testA3.pdf with text "rstuvwxyz" (3.55 KB). The font is embedded and the file size is relatively low because we only embed a subset of the font.

Now we merge these files using the following code, using the smart parameter to indicate whether we want to use PdfCopy or PdfSmartCopy:

public void mergeFiles(String[] files, String result, boolean smart) throws IOException, DocumentException {
    Document document = new Document();
    PdfCopy copy;
    if (smart)
        copy = new PdfSmartCopy(document, new FileOutputStream(result));
    else
        copy = new PdfCopy(document, new FileOutputStream(result));
    document.open();
    PdfReader[] reader = new PdfReader[3];
    for (int i = 0; i < files.length; i++) {
        reader[i] = new PdfReader(files[i]);
        copy.addDocument(reader[i]);
    }
    document.close();
    for (int i = 0; i < reader.length; i++) {
        reader[i].close();
    }
}

When we merge the document, be it with PdfCopy or PdfSmartCopy, the different subsets of the same font will be copied as separate objects in the resulting PDF testA_merged1.pdf / testA_merged2.pdf (both 9.75 KB).

This is the problem you are experiencing: PdfSmartCopy can detect and reuse identical objects, but the different subsets of the same font aren't identical and iText can't merge different subsets of the same font into one font.

The second time, we use the parameters embedded = true and subset = false, resulting in the files testB1.pdf (21.38 KB), testB2.pdf (21.38 KB) and testA3.pdf (21.38 KB). The font is fully embedded and the file size of a single file is a lot bigger than before because the full font is embedded.

If we merge the files using PdfCopy, the font will be present in the merged document redundantly, resulting in the bloated file testB_merged1.pdf (63.16 KB). This is definitely not what you want!

However, if we use PdfSmartCopy, iText detects an identical font stream and reuses it, resulting in testB_merged2.pdf (21.95 KB) which is much smaller than we had with PdfCopy. It's still bigger than the document with the subsetted fonts, but if you're concatenating a huge amount of files, the result will be better if you embed the complete font.

The third time, we use the parameters embedded = false and subset = false, resulting in the files testC1.pdf (2.04 KB), testC2.pdf (2.04 KB) and testC3.pdf (2.04 KB). The font isn't embedded, resulting in an excellent file size, but if you compare with one of the previous results, you'll see that the font looks completely different.

We merge the files using PdfSmartCopy, resulting in testC_merged1.pdf (2.6 KB). Again, we have an excellent file size, but again we have the problem that the font isn't visualized correctly.

To fix this, we need to embed the font:

private void embedFont(String merged, String fontfile, String result) throws IOException, DocumentException {
    // the font file
    RandomAccessFile raf = new RandomAccessFile(fontfile, "r");
    byte fontbytes[] = new byte[(int)raf.length()];
    raf.readFully(fontbytes);
    raf.close();
    // create a new stream for the font file
    PdfStream stream = new PdfStream(fontbytes);
    stream.flateCompress();
    stream.put(PdfName.LENGTH1, new PdfNumber(fontbytes.length));
    // create a reader object
    PdfReader reader = new PdfReader(merged);
    int n = reader.getXrefSize();
    PdfObject object;
    PdfDictionary font;
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(result));
    PdfName fontname = new PdfName(BaseFont.createFont(fontfile, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED).getPostscriptFontName());
    for (int i = 0; i < n; i++) {
        object = reader.getPdfObject(i);
        if (object == null || !object.isDictionary())
            continue;
        font = (PdfDictionary)object;
        if (PdfName.FONTDESCRIPTOR.equals(font.get(PdfName.TYPE))
            && fontname.equals(font.get(PdfName.FONTNAME))) {
            PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
            font.put(PdfName.FONTFILE2, objref.getIndirectReference());
        }
    }
    stamper.close();
    reader.close();
}

Now, we have the file testC_merged2.pdf (22.03 KB) and that's actually the answer to your question. As you can see, the second option is better than this third option.

Caveats: This example uses the Gravitas One font as a simple font. As soon as you use the font as a composite font (you tell iText to use it as a composite font by choosing the encoding IDENTITY-H or IDENTITY-V), you can no longer choose whether or not to embed the font, whether or not to subset the font. As defined in ISO-32000-1, iText will always embed composite fonts and will always subset them.

This means that you can't use the above solutions when you need special fonts (Chinese, Japanese, Korean). In that case, you shouldn't embed the fonts, but use so-called CJK fonts. They CJK fonts will use font packs that can be downloaded by Adobe Reader.

上一篇 : ：从PDF文件提取文本下一篇 : 使用C＃从代码创建完整的Web页面

如何使用字体信息创建pdf并在将它们合并为单个pdf时嵌入实际字体

相关阅读

技术问答最新文章