Download List

项目描述

docx2txt is a tool that attempts to generate equivalent text files from (even corrupted) Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ASCII) text experience.

It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to fair extent. It depends upon a commandline unzipping program (like unzip, 7z, pkzipc, or wzunzip) that can silently extract single files from zip archives to console/standard output/pipe.

It can very conveniently be used to build a Web based docx document conversion service. Some Makefiles and Windows batch files are provided for easy installation of the scripts. With unzippers like CakeCmd that can deal with corrupt Zip archives, this tool can extract text from corrupt docx documents in many cases, where MS word processor fails to even open them.

系统要求

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2012-01-15 11:10
1.2

Perl 脚本现在可以输入从标准输入,并还与输入/输出重定向。脚本文件和配置文件现在可以在不同的目录 (非 Windows) 使用在系统上生成文件为安装安装。配置文件是现在一致地在寻找当前目录、 用户配置目录中和的系统配置目录,按指定的顺序。(非文本) 的特殊字符的处理有所改善,还支持更多的非文本字符,像分数。
标签: Major feature enhancements
The Perl script can now take input from stdin, and also works with input/output redirection. Script files and the configuration file can now be installed in separate directories on (non-Windows) systems using Makefile for installation. The configuration file is now uniformly looked for in the current directory, the user configuration directory, and the system configuration directory, in the specified order. Handling of special (non-text) characters has been improved, along with support for more non-text characters, like fractions.

2011-12-13 07:28
1.1

未成年人非提取功能增强和错误修正,基于的反馈输入从用户收到的。检查解压命令的存在。配置文件被家的 $HOME,以及。以 config_ 现在开始配置变量。已修复 bug #3003903、 #3082018 和 #3082035。这个软件的 null 设备已得到修复。上标的交叉引用现在放在 [...] 内。
标签: Minor feature enhancements and bug fixes
Minor non-extraction feature enhancements and bugfixes, based on the feedback/input received from users. A check for the existence of the unzip command.
The configuration file is looked for in $HOME as well. Configuration variables now begin with config_ . Bugs #3003903, #3082018, and #3082035 have been fixed. The null device for Cygwin has been fixed. Superscripted cross-references are placed within [...] now.

2009-10-05 18:21
1.0

这将释放主要集中在用户交互方面。这些新功能是Windows的安装脚本,一个Windows封装脚本,从解压缩的CakeCmd,配置文件,并持有与解压缩的目录内容外,支持使用工作的支持。docx文件。目前已在短线理由处理改善,许多案件都错过了以前的办法进行捕获。路径名包含空格的,现在处理。
标签: Major feature enhancements
This releases focuses mainly on user interaction aspects. The new features are a Windows installation script, a Windows wrapper script, support for using CakeCmd apart from Unzip, a configuration file, and support for working with a directory holding the unzipped content of .docx file. There has been improvement in handling of short line justification; many cases that were missed out in the earlier approach are captured. Path names containing spaces are now handled.

2009-09-06 16:43
0.4

超链接显示的是可配置的。目录相关清理已完成。许多新的字符转换得到执行。字符转换表增加了。货币字符转换为全货币名称。代码进行调整,都是为加快转换过程。
Display of hyperlinks is configurable. TOC related cleanup was done. Many new character conversions were implemented. Character conversion tables were added. Currency characters are converted to full currency names. Code tweaks were done to speed up the conversion process.

2008-09-24 14:06
0.3

中心和装修右对齐文本在一个(可调)行80列。超文本链接文本指示随着超链接。阿BSD的生成。如何Windows用户可以使用这个工具,更多的文档的一些建议。 docx2txt.pl调用已经改变了一点。用户在安装过程中的参与减少。
标签: Minor feature enhancements
Center and right justification of text fitting in a line of (adjustable) 80 columns. Indication of hyperlinked text along with the hyperlink. A BSD makefile. Some suggestions on how Windows users can use this tool and more documentation. docx2txt.pl invocation has been changed a little. User involvement during installation is reduced.

Project Resources