且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从Word文件使用C#中提取文本?

更新时间:2023-02-12 22:20:35

如果您尝试使用Word对象模型,你必须总是实例化客户端在某一版本的Word(因为一台服务器上运行Word,不推荐) 。不幸的是,你将取决于字的关于旧文件的限制,例如对在Word 2010中,您只能在沙盒模式下打开从Office 95文件(即你不能够通过编程访问该文件的内容)。此外,你将不得不面对未知模板内容(与宏的文档附加,例如)。

If you try to use the Word object model, you must always instantiate a certain version of Word on the client (since running Word on a server is not recommended). Unfortunately, you'll depend of the restriction of Word concerning older files, e.g. in Word 2010 you can open files from Office 95 only in sandbox mode (i.e you're not able to access the file content programmatically). Additionally, you'll have to deal with unknown template content (documents with macros attached, for example).

在你的情况,我宁愿找一个3P-组件,它允许用户访问的内容。 我知道从文档管理系统,如OpenText的eDocs中与自治iManage的,他们使用其他工具的所有类型的全索引文件和可以present在查看器应用程序的内容。所以,如果你在这个方向上,可能是你找到一些有用的东西。

In your case I'd rather look for a 3p-component which allows to access the content. I know from document management systems like OpenText eDocs and Autonomy iManage that they use other tools to full-index documents of all types and can present the content in a viewer application. So if you look in this direction, may be you find something useful.