且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从word文件中读取除表数据之外的文本?

更新时间:2022-10-18 17:14:05

最简单的方法可能是删除所有形状,inlineshapes&文件中的表格。但是,您可以考虑将它们转换为文本,而不是删除表。删除/转换内容后,您可以在
一遍中阅读整个文档。在  VBA中,可以这么简单:


Sub Demo()

With ActiveDocument

  Do While .InlineShapes.Count> 0

    .InlineShapes(1)。删除

 循环

  Do While.Shapes.Count> 0

    .Shapes(1)。删除

 循环

  Do While .Tables.Count> 0

    .Tables(1)。删除$
 循环

结束与
结束子


我将留给你做C#实现。


HI,

In my application I have to read the content from a word(DOCX) file line by line by excluding other shapes(like table,chart etc). From the below code I am able to read the content but it also include the text from a table.

private void GetParaDetail(Word.Document doc)
        {
            foreach(Word.Paragraph para in doc.Paragraphs)
            {
                string temp = para.Range.Text.Trim();
            }
        }

I uploaded a file to this location(https://1drv.ms/w/s!Ah-Jh2Ok5SuHcCKzdzlY6etFDv8), by using above code for the file I got the below paragraphs sequentially   

1111111111111
2222222222222
3333333333333
4444444444444
5555555555555
.
.
.
.
kkkkkkkkkkk

but I need the below text. I searched a lot but didnt find any helpful information. all are referring the above code only. 

1111111111111
2222222222222
kkkkkkkkkkk

The simplest method might be to delete all shapes, inlineshapes & tables from the document. Instead of deleting tables, though, you might consider converting them to text. Once you've deleted/converted the content, you can read the whole document in one pass. In VBA that could be as simple as:

Sub Demo()
With ActiveDocument
  Do While .InlineShapes.Count > 0
    .InlineShapes(1).Delete
  Loop
  Do While .Shapes.Count > 0
    .Shapes(1).Delete
  Loop
  Do While .Tables.Count > 0
    .Tables(1).Delete
  Loop
End With
End Sub

I'll leave it to you to do the C# implementation.