Download List

项目描述

docx2txt is a tool that attempts to generate equivalent text files from (even corrupted) Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ASCII) text experience.

It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to fair extent. It depends upon a commandline unzipping program (like unzip, 7z, pkzipc, or wzunzip) that can silently extract single files from zip archives to console/standard output/pipe.

It can very conveniently be used to build a Web based docx document conversion service. Some Makefiles and Windows batch files are provided for easy installation of the scripts. With unzippers like CakeCmd that can deal with corrupt Zip archives, this tool can extract text from corrupt docx documents in many cases, where MS word processor fails to even open them.

系统要求

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2009-10-05 18:21 Back to release list
1.0

这将释放主要集中在用户交互方面。这些新功能是Windows的安装脚本,一个Windows封装脚本,从解压缩的CakeCmd,配置文件,并持有与解压缩的目录内容外,支持使用工作的支持。docx文件。目前已在短线理由处理改善,许多案件都错过了以前的办法进行捕获。路径名包含空格的,现在处理。
标签: Major feature enhancements
This releases focuses mainly on user interaction aspects. The new features are a Windows installation script, a Windows wrapper script, support for using CakeCmd apart from Unzip, a configuration file, and support for working with a directory holding the unzipped content of .docx file. There has been improvement in handling of short line justification; many cases that were missed out in the earlier approach are captured. Path names containing spaces are now handled.

Project Resources