任务单 #31786

UniList speed improvements

开放日期: 2013-07-27 06:32 最后更新: 2013-08-02 06:00

报告人:
属主:
状态:
关闭
组件:
优先:
5 - Medium
严重性:
5 - Medium
处理结果:
Accepted
文件:

Details

The current Unicode list generator (-U option) works by actually generating EIDS syntax for every one of the more than a million characters in the spec, parsing it, and then doing tree matching as usual. This is slow, especially in the common case of a query that matches only one character. We should:

a. Detect the case where the query's root node has a head. Then since every tree in the dictionary has a head, we know exactly one entry will match (unless the head is more than one character in which case there is no match at all) and we know which one, so we could just generate that one entry.

b. In other cases, generate trees instead of generating EIDS syntax and then pass them directly to the matcher instead of going to and from UTF-8 byte streams. Generating trees requires more code, but it will save a lot of expensive parsing.

任务单历史 (2/2 Histories)

2013-07-27 06:32 Updated by: mskala
  • New Ticket "UniList speed improvements" created
2013-08-02 06:00 Updated by: mskala
  • 处理结果 Update from to Accepted
  • 状态 Update from 开启 to 关闭
  • Ticket Close date is changed to 2013-08-02 06:00

Attachment File List

No attachments

编辑

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » 登录名