更新时间:2022-10-18 17:14:05
最简单的方法可能是删除所有形状,inlineshapes&文件中的表格。但是,您可以考虑将它们转换为文本,而不是删除表。删除/转换内容后,您可以在
一遍中阅读整个文档。在 VBA中,可以这么简单:Sub Demo()
With ActiveDocument
Do While .InlineShapes.Count> 0
.InlineShapes(1)。删除
循环
Do While.Shapes.Count> 0
.Shapes(1)。删除
循环
Do While .Tables.Count> 0
.Tables(1)。删除$
循环
结束与
结束子我将留给你做C#实现。
HI,
In my application I have to read the content from a word(DOCX) file line by line by excluding other shapes(like table,chart etc). From the below code I am able to read the content but it also include the text from a table.
private void GetParaDetail(Word.Document doc) { foreach(Word.Paragraph para in doc.Paragraphs) { string temp = para.Range.Text.Trim(); } }
I uploaded a file to this location(https://1drv.ms/w/s!Ah-Jh2Ok5SuHcCKzdzlY6etFDv8), by using above code for the file I got the below paragraphs sequentially
1111111111111 2222222222222 3333333333333 4444444444444 5555555555555 . . . . kkkkkkkkkkk
but I need the below text. I searched a lot but didnt find any helpful information. all are referring the above code only.
1111111111111 2222222222222 kkkkkkkkkkk
The simplest method might be to delete all shapes, inlineshapes & tables from the document. Instead of deleting tables, though, you might consider converting them to text. Once you've deleted/converted the content, you can read the whole document in one pass. In VBA that could be as simple as:
Sub Demo()
With ActiveDocument
Do While .InlineShapes.Count > 0
.InlineShapes(1).Delete
Loop
Do While .Shapes.Count > 0
.Shapes(1).Delete
Loop
Do While .Tables.Count > 0
.Tables(1).Delete
Loop
End With
End SubI'll leave it to you to do the C# implementation.