Allow specification on the command line of a file that in some way lists kanji (EIDS heads), so that "is on the list" becomes available as a matching predicate. The killer app: I'd like to be able to search KanjiVG or CHISE for kanji that are not in Tsukurimashou but are lr or tb combinations of kanji in Tsukurimashou.
Detailed proposal: one or more command-line options to specify user-defined matching predicates. These might include: specify a literal string in EIDS match-pattern syntax (matches anything the string does); specify a text file name and match any non-blank character in the file, or any line; specify a font file (OTF format, other?) and match any character that is defined in the font. New matching operator, unary .user. or .#. (with sugary brackets), which looks at the head of its child as a decimal number (considered equal to 1 if not otherwise valid), and takes that as the 1-based index into the list of user-defined matching predicates that have been specified. Using the head of the child means that in the common case of no more than nine user-defined predicates, we can use simple "#1", "#2", "#3" syntax which looks like argument-substitution in such languages as TeX.
(beware editing this - the Wiki software here will garble it if not carefully escaped with extra exclamation points)
That would mean "match in KanjiVG dictionary, any kanji not in the font file, where the kanji is either an [lr] or a [tb] combination of two kanji that are in the font file." Such a query would return kanji that might easily be added to the font.
Allow specification on the command line of a file that in some way lists kanji (EIDS heads), so that "is on the list" becomes available as a matching predicate. The killer app: I'd like to be able to search KanjiVG or CHISE for kanji that are not in Tsukurimashou but are lr or tb combinations of kanji in Tsukurimashou.
Detailed proposal: one or more command-line options to specify user-defined matching predicates. These might include: specify a literal string in EIDS match-pattern syntax (matches anything the string does); specify a text file name and match any non-blank character in the file, or any line; specify a font file (OTF format, other?) and match any character that is defined in the font. New matching operator, unary .user. or .#. (with sugary brackets), which looks at the head of its child as a decimal number (considered equal to 1 if not otherwise valid), and takes that as the 1-based index into the list of user-defined matching predicates that have been specified. Using the head of the child means that in the common case of no more than nine user-defined predicates, we can use simple "#1", "#2", "#3" syntax which looks like argument-substitution in such languages as TeX.
Example of use:
idsgrep --user-predicate Tsukurimashou.otf -dkanjivg '&!#1|[lr]#1#1[tb]#1#1'
(beware editing this - the Wiki software here will garble it if not carefully escaped with extra exclamation points)
That would mean "match in KanjiVG dictionary, any kanji not in the font file, where the kanji is either an [lr] or a [tb] combination of two kanji that are in the font file." Such a query would return kanji that might easily be added to the font.