Download List

项目描述

Yioop! is a PHP search engine. Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Yioop can crawl pages or can directly index archives such as ARC and WARC. It supports indexing several file formats such as HTML, Atom, PDF, DOC, PPT, RTF, RSS, XML, SVG, PNG, JPG, BMP, GIF, and sitemaps. The Yioop! crawler can be deployed on one or many machines. It supports having one or more to crawl scheduler processes, as well as multiple fetchers and mirrors. Crawling respects robots.txt including Crawl-delay. Yioop! crawls are stored in a Web archive format that is easy to move around. Crawling can be done on one machine and the results deployed elsewhere. Yioop! supports mixing of crawls. Yioop! comes with a search front end that can be localized as desired using a GUI. This GUI supports RTL languages. Management of crawls can also be done using this GUI. Yioop! can be configured in a straightforward manner to make use of file caching or memcache if available.

系统要求

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2012-02-04 11:54 Back to release list
0.82

这通过允许多个机要保持"要爬下一步"队列的部分释放了改进的可扩展性。查询处理还可以拆分机器,用不同的机器上正在负责给定哈希的文件当中。Yioop!现在支持镜像的机器。两个字短语所确定由如维基百科 URL 转储的 XML 文件现在可以视为一个逻辑单元。Yioop!模型-视图-控制器框架取得了容易扩展和它的文件已添加到该网站。
标签: Minor
This release improved scalability by allowing multiple machines to maintain portions of the "to crawl next" queue. Query processing can also be split amongst machines, with different machines being responsible for documents of a given hash. Yioop! now supports mirroring of machines. Two word phrases as determined by an XML file such as Wikipedia URL dump can now be treated as a logical unit. The Yioop! model-view-controller framework has been made easier to extend and documentation for it has been added to the website.

Project Resources