且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用c#从pdf文件中检索和存储文本和图像

更新时间:2023-01-30 17:12:48

https://psycodedeveloper.wordpress.com/2013/01/10/how-to-extract-images-from-pdf- files-using-c-and-itextsharp / [ ^ ]

https://bytescout.com/products/developer/pdfextractorsdk/how- to-extract-images-from-pdf-page-by-page-in-c-%2523 [ ^ ]



和一篇优秀文章:用C#中的PDF提取文本(100%.NET) [ ^ ]

这一切有什么问题?



-KR


http://bitmiracle.com/pdf-library/ [ ^ ]用于从PDF中提取图像:



 静态  void  ExtractAllImages()
{
string path = ;
使用(PdfDocument pdf = new PdfDocument(path))
{
for int i = 0 ; i < pdf.Images.Count; i ++)
{
string imageName = string .Format( image {0} ,i);
string imagePath = pdf.Images [i] .Save(imageName);
}
}
}





这似乎是一个很好的起点,祝你好运!


Anybody help me to retrieve and store text and image from pdf file using c#.
I m having one pdf file that contains both image and text in table format.

Refer this https://drive.google.com/file/d/0B8h0NyF1SxcEWXZHdlp5TjRJXzA/view?usp=sharing

I wanna to import that pdf content to my wpf listview.

https://psycodedeveloper.wordpress.com/2013/01/10/how-to-extract-images-from-pdf-files-using-c-and-itextsharp/[^]
https://bytescout.com/products/developer/pdfextractorsdk/how-to-extract-images-from-pdf-page-by-page-in-c-%2523[^]

And one excellent article: Extract Text from PDF in C# (100% .NET)[^]
What's wrong with all this ?

-KR


http://bitmiracle.com/pdf-library/[^] an be used to extract images from PDFs:

static void ExtractAllImages()
{
    string path = "";
    using (PdfDocument pdf = new PdfDocument(path))
    {
        for (int i = 0; i < pdf.Images.Count; i++)
        {
            string imageName = string.Format("image{0}", i);
            string imagePath = pdf.Images[i].Save(imageName);
        }
    }
}



This seems to be a good start point, good luck!