更新时间:2023-11-29 15:52:34
我赞成@ypnos的回答,这是完全正确的.
但是我想通过展示pdfimages
实用程序的最新功能来补充它.
pdfimages
能够从PDF文件提取图像(这是它唯一有用的目的). 但是,现在,您也可以使用它来调查有关所使用图像的更多详细信息,而无需提取它们.
在下一个命令中,我使用新的-list
参数查询某个PDF文件的第7页和第8页上的所有图像的数据:
pdfimages -list -f 7 -l 8 ct-magazin-14-2012.pdf page num type width height color comp bpc enc interp object ID --------------------------------------------------------------------- 7 0 image 581 838 rgb 3 8 jpeg no 39 0 7 1 image 4 4 rgb 3 8 image no 40 0 7 2 image 314 332 rgb 3 8 jpx no 44 0 7 3 image 358 430 rgb 3 8 jpx no 45 0 7 4 image 4 4 rgb 3 8 image no 46 0 7 5 image 4 4 rgb 3 8 image no 47 0 7 6 image 4 6 rgb 3 8 image no 48 0 7 7 image 596 462 rgb 3 8 jpx no 49 0 7 8 image 4 6 rgb 3 8 image no 50 0 7 9 image 4 4 rgb 3 8 image no 51 0 7 10 image 8 10 rgb 3 8 image no 41 0 7 11 image 6 6 rgb 3 8 image no 42 0 7 12 image 113 27 rgb 3 8 jpx no 43 0 8 13 image 582 839 gray 1 8 jpeg no 2080 0 8 14 image 344 364 gray 1 8 jpx no 2079 0
注意,但是:此版本的pdfimages
是Poppler的版本(XPDF的版本还不是(还?)支持此新功能):
pdfimages -version pdfimages version 0.20.2 Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC
-list
选项首次出现在2012年3月1日发布的Poppler v0.19.0中.
现在,上面的列表没有直接告诉您图像的分辨率("dpi").该值取决于:在PDF页面上呈现此图像的尺寸是什么?
PDF可以轻松拥有在PDF文件的不同位置使用的相同图像,每种情况下使用不同的呈现大小.图像仅需要嵌入到PDF中一次,但是可以多次通过引用"使用/渲染(效率低下的PDF可能仍然多次包含同一图像,但这是不同的主题...)
现在,让我们清除查看相应列标题可能引起的问题.是什么意思?
page
num
type
image
(不透明图像),mask
(单色图像蒙版),smask
(软蒙版图像)和stencil
(用于绘制颜色或图案的单色蒙版图像) ). 注意: 使用两个单独的PDF对象创建图像的PDF透明度:一个用于图像,一个用于遮罩或遮罩.属于透明图像的遮罩/遮罩始终直接位于列表中的图像之后. width
height
color
gray
,rgb
,cmyk
,lab
(L * a * b),icc
(基于ICC),index
(索引颜色),sep
(分隔)和devn
(设备N). comp
bpc
enc
image
(光栅图像-可以在内部使用通用的/Flate
或/LZW
压缩,但不能使用特殊的图像编码),jpeg
(JPEG压缩),jpx
(JPEG2000压缩) ,jbig2
(JBIG2压缩)和ccitt
(传真压缩). interp
yes
. object ID
从Poppler v0.25.0(2013年12月11日发布)和更高版本开始,命令pdfimages -list
现在包括新列,这些列分别指示自动计算的x-ppi
(水平)和y-ppi
(垂直)分辨率. PDF渲染器在PDF页面内显示的嵌入图像.
此外,还指出了每个图像(未压缩时)使用的大小(以字节/千字节为单位)以及其大小压缩率(嵌入在PDF中).
要显示与上述相同文件的结果(使用Poppler v0.42.0):
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
7 0 image 581 838 rgb 3 8 jpeg no 39 0 73 73 2107B 0.1%
7 1 image 4 4 rgb 3 8 image no 40 0 150 150 54B 112%
7 2 image 314 332 rgb 3 8 jpx no 44 0 150 150 19.0K 6.2%
7 3 image 358 430 rgb 3 8 jpx no 45 0 150 150 15.7K 3.5%
7 4 image 4 4 rgb 3 8 image no 46 0 150 150 62B 129%
7 5 image 4 4 rgb 3 8 image no 47 0 150 150 51B 106%
7 6 image 4 6 rgb 3 8 image no 48 0 150 150 62B 86%
7 7 image 596 462 rgb 3 8 jpx no 49 0 150 150 40.7K 5.0%
7 8 image 4 6 rgb 3 8 image no 50 0 150 150 86B 119%
7 9 image 4 4 rgb 3 8 image no 51 0 150 150 62B 129%
7 10 image 8 10 rgb 3 8 image no 41 0 150 150 157B 65%
7 11 image 6 6 rgb 3 8 image no 42 0 150 150 82B 76%
7 12 image 113 27 rgb 3 8 jpx no 43 0 151 152 1090B 12%
8 13 image 582 839 gray 1 8 jpeg no 2080 0 72 72 319B 0.1%
8 14 image 344 364 gray 1 8 jpx no 2079 0 150 150 4325B 3.5%
x-ppi
y-ppi
size
ratio
the question says it all. Are there PDF documents that contain images with different dpi (Dot Per Inch) ?
Or is it assumed that if I know the dpi of one image, I know it of the whole document?
I upvoted @ypnos' answer, which is completely correct.
But I'd like to complement it by showing a very recent, new feature of the pdfimages
utility.
pdfimages
was previously known to be able to extract images from PDF files (and that was its only useful purpose). However, now you can also use it to investigate for more details about the images used, without extracting them.
With the next command I query for the data of all images on pages 7 and 8 of a certain PDF file, using the new -list
parameter:
pdfimages -list -f 7 -l 8 ct-magazin-14-2012.pdf page num type width height color comp bpc enc interp object ID --------------------------------------------------------------------- 7 0 image 581 838 rgb 3 8 jpeg no 39 0 7 1 image 4 4 rgb 3 8 image no 40 0 7 2 image 314 332 rgb 3 8 jpx no 44 0 7 3 image 358 430 rgb 3 8 jpx no 45 0 7 4 image 4 4 rgb 3 8 image no 46 0 7 5 image 4 4 rgb 3 8 image no 47 0 7 6 image 4 6 rgb 3 8 image no 48 0 7 7 image 596 462 rgb 3 8 jpx no 49 0 7 8 image 4 6 rgb 3 8 image no 50 0 7 9 image 4 4 rgb 3 8 image no 51 0 7 10 image 8 10 rgb 3 8 image no 41 0 7 11 image 6 6 rgb 3 8 image no 42 0 7 12 image 113 27 rgb 3 8 jpx no 43 0 8 13 image 582 839 gray 1 8 jpeg no 2080 0 8 14 image 344 364 gray 1 8 jpx no 2079 0
Note, however: this version of pdfimages
is the one from Poppler (the one from XPDF does not (yet?) support this new feature):
pdfimages -version pdfimages version 0.20.2 Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC
The -list
option appeared for the first time in Poppler v0.19.0, released on March 1st, 2012.
Now, the above list does not directly tell you the resolution ("dpi") of the image. That value is dependent on: at which size is this image rendered on the PDF page?
A PDF can easily have the same image used at different spots of a PDF file, using a different rendering size for each occasion. The image needs to be embedded into the PDF only once but can be used/rendered 'by reference' multiple times (inefficiently constructed PDFs may still contain the same image multiple times, but that's a different topic...)
Now let's clear up the questions which may arise from looking at the respective column headings. What do they mean?
page
num
type
image
(an opaque image), mask
(a monochrome image mask), smask
(a soft-mask image) and stencil
(a monochrome mask image used for painting a color or a pattern). Note: Transparency in PDF for images is created by using two separate PDF objects: one for the image and one for the mask or smask. The mask/smask belonging to a transparent image always directly follows image in the listing.
width
height
color
gray
, rgb
, cmyk
, lab
(L*a*b), icc
(ICC based), index
(indexed colors), sep
(separation) and devn
(DeviceN).comp
bpc
enc
image
(a raster image -- may internally use the generic /Flate
or /LZW
compression, but not a special image encoding), jpeg
(JPEG compression), jpx
(JPEG2000 compression), jbig2
(JBIG2 compression) and ccitt
(Fax compression).interp
yes
if interpolation was requested when scaling up the image.object ID
As of Poppler v0.25.0 (released December 11, 2013) and later versions, the command pdfimages -list
now includes new columns which indicate the automatically calculated x-ppi
(horizontal) and y-ppi
(vertical) resolutions for each embedded image as displayed within the PDF page by the PDF renderer.
In addition, the size (in Bytes/kBytes) used by each image (when uncompressed) as well as its size compression ratio (as embedded in PDF) are indicated.
To show the result (using Poppler v0.42.0) for the same file as above:
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
7 0 image 581 838 rgb 3 8 jpeg no 39 0 73 73 2107B 0.1%
7 1 image 4 4 rgb 3 8 image no 40 0 150 150 54B 112%
7 2 image 314 332 rgb 3 8 jpx no 44 0 150 150 19.0K 6.2%
7 3 image 358 430 rgb 3 8 jpx no 45 0 150 150 15.7K 3.5%
7 4 image 4 4 rgb 3 8 image no 46 0 150 150 62B 129%
7 5 image 4 4 rgb 3 8 image no 47 0 150 150 51B 106%
7 6 image 4 6 rgb 3 8 image no 48 0 150 150 62B 86%
7 7 image 596 462 rgb 3 8 jpx no 49 0 150 150 40.7K 5.0%
7 8 image 4 6 rgb 3 8 image no 50 0 150 150 86B 119%
7 9 image 4 4 rgb 3 8 image no 51 0 150 150 62B 129%
7 10 image 8 10 rgb 3 8 image no 41 0 150 150 157B 65%
7 11 image 6 6 rgb 3 8 image no 42 0 150 150 82B 76%
7 12 image 113 27 rgb 3 8 jpx no 43 0 151 152 1090B 12%
8 13 image 582 839 gray 1 8 jpeg no 2080 0 72 72 319B 0.1%
8 14 image 344 364 gray 1 8 jpx no 2079 0 150 150 4325B 3.5%
x-ppi
y-ppi
size
ratio