更新时间:2023-12-05 15:23:16
聚会迟到了,这里有一个简单的解决方案,暗示已经包含字体的 pdf 文件不是仅基于图像的:
Being late for the party, here's a simple solution implying that pdf files already containing fonts aren't image based only:
find ./ -name "*.pdf" -print0 | xargs -0 -I {}
bash -c 'export file="{}";
if [ $(pdffonts "$file" 2> /dev/null |
wc -l) -lt 3 ]; then echo "$file"; fi'
单线
find ./ -name "*.pdf" -print0 | xargs -0 -I {} bash -c 'export file="{}"; if [ $(pdffonts "$file" 2> /dev/null | wc -l) -lt 3 ]; then echo "$file"; fi'
说明:如果 pdf 包含文本,pdffonts file.pdf
将显示超过 2 行.输出所有不包含文本的 pdf 文件的文件名.
Explanation:
pdffonts file.pdf
will show more than 2 lines if pdf contains text.
Outputs filenames of all pdf files that don't contain text.
我的具有相同功能的 OCR 项目在 Github deajan/pmOCR.
My OCR project which has the same feature is in Github deajan/pmOCR.