且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

PDF文档是否可以包含具有不同DPI的图像?

更新时间:2023-11-29 15:52:34

我赞成@ypnos的回答,这是完全正确的.

但是我想通过展示pdfimages实用程序的最新功能来补充它.

以前已知

pdfimages能够从PDF文件提取图像(这是它唯一有用的目的). 但是,现在,您也可以使用它来调查有关所使用图像的更多详细信息,而无需提取它们.

在下一个命令中,我使用新的-list参数查询某个PDF文件的第7页和第8页上的所有图像的数据:

pdfimages -list -f 7 -l 8  ct-magazin-14-2012.pdf

  page   num  type   width height color comp bpc  enc interp  object ID
  ---------------------------------------------------------------------
     7     0 image     581   838  rgb     3   8  jpeg   no        39  0
     7     1 image       4     4  rgb     3   8  image  no        40  0
     7     2 image     314   332  rgb     3   8  jpx    no        44  0
     7     3 image     358   430  rgb     3   8  jpx    no        45  0
     7     4 image       4     4  rgb     3   8  image  no        46  0
     7     5 image       4     4  rgb     3   8  image  no        47  0
     7     6 image       4     6  rgb     3   8  image  no        48  0
     7     7 image     596   462  rgb     3   8  jpx    no        49  0
     7     8 image       4     6  rgb     3   8  image  no        50  0
     7     9 image       4     4  rgb     3   8  image  no        51  0
     7    10 image       8    10  rgb     3   8  image  no        41  0
     7    11 image       6     6  rgb     3   8  image  no        42  0
     7    12 image     113    27  rgb     3   8  jpx    no        43  0
     8    13 image     582   839  gray    1   8  jpeg   no      2080  0
     8    14 image     344   364  gray    1   8  jpx    no      2079  0

注意,但是:此版本的pdfimages是Poppler的版本(XPDF的版本还不是(还?)支持此新功能):

pdfimages -version

  pdfimages version 0.20.2
  Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org
  Copyright 1996-2011 Glyph & Cog, LLC

-list选项首次出现在2012年3月1日发布的Poppler v0.19.0中.

现在,上面的列表没有直接告诉您图像的分辨率("dpi").该值取决于:在PDF页面上呈现此图像的尺寸是什么?

PDF可以轻松拥有在PDF文件的不同位置使用的相同图像,每种情况下使用不同的呈现大小.图像仅需要嵌入到PDF中一次,但是可以多次通过引用"使用/渲染(效率低下的PDF可能仍然多次包含同一图像,但这是不同的主题...)

现在,让我们清除查看相应列标题可能引起的问题.是什么意思?

page

  • 包含图像的PDF中的页码.

num

  • 当前列表的图像编号.

type

  • 图像类型.可能的值为:image(不透明图像),mask(单色图像蒙版),smask(软蒙版图像)和stencil(用于绘制颜色或图案的单色蒙版图像) ). 注意: 使用两个单独的PDF对象创建图像的PDF透明度:一个用于图像,一个用于遮罩或遮罩.属于透明图像的遮罩/遮罩始终直接位于列表中的图像之后.

width

  • 图像宽度(以像素为单位).

height

  • 图像高度(以像素为单位).

color

  • 图像颜色空间.可能的值为grayrgbcmyklab(L * a * b),icc(基于ICC),index(索引颜色),sep(分隔)和devn(设备N).

comp

  • 图像使用的颜色分量的数量.

bpc

  • 图像使用的每个颜色分量的位数.

enc

  • 图像使用的编码(压缩).可能的值为:image(光栅图像-可以在内部使用通用的/Flate/LZW压缩,但不能使用特殊的图像编码),jpeg(JPEG压缩),jpx(JPEG2000压缩) ,jbig2(JBIG2压缩)和ccitt(传真压缩).

interp

  • 如果按比例放大图像时需要插值,则为yes.

object ID

  • 文件中图像的PDF对象ID(带有世代号").

更新(2016年3月)

从Poppler v0.25.0(2013年12月11日发布)和更高版本开始,命令pdfimages -list现在包括新列,这些列分别指示自动计算的x-ppi(水平)和y-ppi(垂直)分辨率. PDF渲染器在PDF页面内显示的嵌入图像.

此外,还指出了每个图像(未压缩时)使用的大小(以字节/千字节为单位)以及其大小压缩率(嵌入在PDF中).

要显示与上述相同文件的结果(使用Poppler v0.42.0):

page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
   7  0 image   581   838  rgb     3   8 jpeg   no       39 0    73    73 2107B 0.1%
   7  1 image     4     4  rgb     3   8 image  no       40 0   150   150   54B 112%
   7  2 image   314   332  rgb     3   8 jpx    no       44 0   150   150 19.0K 6.2%
   7  3 image   358   430  rgb     3   8 jpx    no       45 0   150   150 15.7K 3.5%
   7  4 image     4     4  rgb     3   8 image  no       46 0   150   150   62B 129%
   7  5 image     4     4  rgb     3   8 image  no       47 0   150   150   51B 106%
   7  6 image     4     6  rgb     3   8 image  no       48 0   150   150   62B  86%
   7  7 image   596   462  rgb     3   8 jpx    no       49 0   150   150 40.7K 5.0%
   7  8 image     4     6  rgb     3   8 image  no       50 0   150   150   86B 119%
   7  9 image     4     4  rgb     3   8 image  no       51 0   150   150   62B 129%
   7 10 image     8    10  rgb     3   8 image  no       41 0   150   150  157B  65%
   7 11 image     6     6  rgb     3   8 image  no       42 0   150   150   82B  76%
   7 12 image   113    27  rgb     3   8 jpx    no       43 0   151   152 1090B  12%
   8 13 image   582   839  gray    1   8 jpeg   no     2080 0    72    72  319B 0.1%
   8 14 image   344   364  gray    1   8 jpx    no     2079 0   150   150 4325B 3.5%

x-ppi

  • 在PDF页面上呈现时图像的水平分辨率(以每英寸像素为单位).

y-ppi

  • 在PDF页面上呈现时图像的垂直分辨率(以每英寸像素为单位).

size

  • PDF文件中嵌入图像的大小.正在使用以下后缀:"B"字节,"K"千字节,"M"兆字节和"G"千兆字节.

ratio

  • 嵌入图像的压缩率.

the question says it all. Are there PDF documents that contain images with different dpi (Dot Per Inch) ?

Or is it assumed that if I know the dpi of one image, I know it of the whole document?

I upvoted @ypnos' answer, which is completely correct.

But I'd like to complement it by showing a very recent, new feature of the pdfimages utility.

pdfimages was previously known to be able to extract images from PDF files (and that was its only useful purpose). However, now you can also use it to investigate for more details about the images used, without extracting them.

With the next command I query for the data of all images on pages 7 and 8 of a certain PDF file, using the new -list parameter:

pdfimages -list -f 7 -l 8  ct-magazin-14-2012.pdf

  page   num  type   width height color comp bpc  enc interp  object ID
  ---------------------------------------------------------------------
     7     0 image     581   838  rgb     3   8  jpeg   no        39  0
     7     1 image       4     4  rgb     3   8  image  no        40  0
     7     2 image     314   332  rgb     3   8  jpx    no        44  0
     7     3 image     358   430  rgb     3   8  jpx    no        45  0
     7     4 image       4     4  rgb     3   8  image  no        46  0
     7     5 image       4     4  rgb     3   8  image  no        47  0
     7     6 image       4     6  rgb     3   8  image  no        48  0
     7     7 image     596   462  rgb     3   8  jpx    no        49  0
     7     8 image       4     6  rgb     3   8  image  no        50  0
     7     9 image       4     4  rgb     3   8  image  no        51  0
     7    10 image       8    10  rgb     3   8  image  no        41  0
     7    11 image       6     6  rgb     3   8  image  no        42  0
     7    12 image     113    27  rgb     3   8  jpx    no        43  0
     8    13 image     582   839  gray    1   8  jpeg   no      2080  0
     8    14 image     344   364  gray    1   8  jpx    no      2079  0

Note, however: this version of pdfimages is the one from Poppler (the one from XPDF does not (yet?) support this new feature):

pdfimages -version

  pdfimages version 0.20.2
  Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org
  Copyright 1996-2011 Glyph & Cog, LLC

The -list option appeared for the first time in Poppler v0.19.0, released on March 1st, 2012.

Now, the above list does not directly tell you the resolution ("dpi") of the image. That value is dependent on: at which size is this image rendered on the PDF page?

A PDF can easily have the same image used at different spots of a PDF file, using a different rendering size for each occasion. The image needs to be embedded into the PDF only once but can be used/rendered 'by reference' multiple times (inefficiently constructed PDFs may still contain the same image multiple times, but that's a different topic...)

Now let's clear up the questions which may arise from looking at the respective column headings. What do they mean?

page

  • The page number in the PDF containing the image.

num

  • The image number of the current listing.

type

  • The image type. Possible values are: image (an opaque image), mask (a monochrome image mask), smask (a soft-mask image) and stencil (a monochrome mask image used for painting a color or a pattern). Note: Transparency in PDF for images is created by using two separate PDF objects: one for the image and one for the mask or smask. The mask/smask belonging to a transparent image always directly follows image in the listing.

width

  • The image width in pixels.

height

  • The image height in pixels.

color

  • The image color space. Possible values are gray, rgb, cmyk, lab (L*a*b), icc (ICC based), index (indexed colors), sep (separation) and devn (DeviceN).

comp

  • The number of color components used by the image.

bpc

  • The bits per color component used by the image.

enc

  • The encoding (compression) used by the image. Possible values are: image (a raster image -- may internally use the generic /Flate or /LZW compression, but not a special image encoding), jpeg (JPEG compression), jpx (JPEG2000 compression), jbig2 (JBIG2 compression) and ccitt (Fax compression).

interp

  • Is yes if interpolation was requested when scaling up the image.

object ID

  • The image's PDF object ID (with "generation number") inside the file.

Update (March 2016)

As of Poppler v0.25.0 (released December 11, 2013) and later versions, the command pdfimages -list now includes new columns which indicate the automatically calculated x-ppi (horizontal) and y-ppi (vertical) resolutions for each embedded image as displayed within the PDF page by the PDF renderer.

In addition, the size (in Bytes/kBytes) used by each image (when uncompressed) as well as its size compression ratio (as embedded in PDF) are indicated.

To show the result (using Poppler v0.42.0) for the same file as above:

page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
   7  0 image   581   838  rgb     3   8 jpeg   no       39 0    73    73 2107B 0.1%
   7  1 image     4     4  rgb     3   8 image  no       40 0   150   150   54B 112%
   7  2 image   314   332  rgb     3   8 jpx    no       44 0   150   150 19.0K 6.2%
   7  3 image   358   430  rgb     3   8 jpx    no       45 0   150   150 15.7K 3.5%
   7  4 image     4     4  rgb     3   8 image  no       46 0   150   150   62B 129%
   7  5 image     4     4  rgb     3   8 image  no       47 0   150   150   51B 106%
   7  6 image     4     6  rgb     3   8 image  no       48 0   150   150   62B  86%
   7  7 image   596   462  rgb     3   8 jpx    no       49 0   150   150 40.7K 5.0%
   7  8 image     4     6  rgb     3   8 image  no       50 0   150   150   86B 119%
   7  9 image     4     4  rgb     3   8 image  no       51 0   150   150   62B 129%
   7 10 image     8    10  rgb     3   8 image  no       41 0   150   150  157B  65%
   7 11 image     6     6  rgb     3   8 image  no       42 0   150   150   82B  76%
   7 12 image   113    27  rgb     3   8 jpx    no       43 0   151   152 1090B  12%
   8 13 image   582   839  gray    1   8 jpeg   no     2080 0    72    72  319B 0.1%
   8 14 image   344   364  gray    1   8 jpx    no     2079 0   150   150 4325B 3.5%

x-ppi

  • The horizontal resolution of the image (in pixels per inch) when rendered on the PDF page.

y-ppi

  • The vertical resolution of the image (in pixels per inch) when rendered on the PDF page.

size

  • The size of the embedded image in the PDF file. Following suffixes are in use: 'B' bytes, 'K' kilobytes, 'M' megabytes, and 'G' gigabytes.

ratio

  • The compression ratio of the embedded image.