Line wrapping in IDSgrep cooked mode
Basically works, but there need to be a bunch of build system updates to deal with the new generated-source file that's involved in calculating the widths of Unicode characters.
Closing ticket - will pick up the pieces as they are noticed.
Have a feature (which would probably become the default in cases where "cooked" mode is active) to break lines in a way that makes them nicer to read. I envision simply testing, at each point where EIDS syntax will ignore whitespace (i.e. places where it requires an opening bracket, sugar, or syrup), whether the text to print before the next such boundary of forced line break is enough to make a line longer than about 76 characters, and if so, inserting newline and a couple of spaces. Many dictionary entries may contain more than 80 characters inside a single string and in that case we can't add a line break without breaking the semantics, but this would at least tend to split the definition part of EDICT2-based dictionary entries onto a new line in a way that looks good in mock-up, especially when combined with the prototype syntax-colouring feature. It could be activated by setting the "tab" element of the cooking recipe string to "9."
Unfortunately, there is a big hairy yak lurking: the question of how many columns wide a Unicode character actually is. Our text includes lots of both wide and narrow characters and we need to get it right. Unicode's guidance on this in Standard Annex #11 and the associated database column is disappointing: they specify "wide" or "narrow" for many characters (split into sub-categories depending on the history of how the classification decision was made), but also "ambiguous" for many, and Unicode tells you to decide from context for those, with some vague recommendations that ultimately lead to "narrow" being the default for new designs. Isolated "ambiguous" characters exist in the middle of blocks that are otherwise all "wide" with no clear reason for them to be treated differently; isolated "ambiguous" characters also exist in the middle of blocks that are all "narrow" with no clear reason for them to be treated differently. Neither "wide" nor "narrow" is really acceptable as a default for ALL the "ambiguous" characters.
Tentative plan: write a separate C program that reads the Unicode database, plus some extra lines that are allowed to override it, and constructs a BDD that recognizes encoded characters that should be wide. Spit this BDD out as new C code (which can be included in the distribution, for users who don't have the BDD library to regenerate it) and use it to compute character widths. Seems like overkill, but I'm not seeing much better way to be able to compute this function in a flexible and maintainable way.