The current Unicode list generator (-U option) works by actually generating EIDS syntax for every one of the more than a million characters in the spec, parsing it, and then doing tree matching as usual. This is slow, especially in the common case of a query that matches only one character. We should:
a. Detect the case where the query's root node has a head. Then since every tree in the dictionary has a head, we know exactly one entry will match (unless the head is more than one character in which case there is no match at all) and we know which one, so we could just generate that one entry.
b. In other cases, generate trees instead of generating EIDS syntax and then pass them directly to the matcher instead of going to and from UTF-8 byte streams. Generating trees requires more code, but it will save a lot of expensive parsing.
The current Unicode list generator (-U option) works by actually generating EIDS syntax for every one of the more than a million characters in the spec, parsing it, and then doing tree matching as usual. This is slow, especially in the common case of a query that matches only one character. We should:
a. Detect the case where the query's root node has a head. Then since every tree in the dictionary has a head, we know exactly one entry will match (unless the head is more than one character in which case there is no match at all) and we know which one, so we could just generate that one entry.
b. In other cases, generate trees instead of generating EIDS syntax and then pass them directly to the matcher instead of going to and from UTF-8 byte streams. Generating trees requires more code, but it will save a lot of expensive parsing.