argra****@users*****
argra****@users*****
2018年 7月 22日 (日) 19:02:26 JST
Index: docs/perl/5.12.1/perlunicode.pod diff -u docs/perl/5.12.1/perlunicode.pod:1.5 docs/perl/5.12.1/perlunicode.pod:1.6 --- docs/perl/5.12.1/perlunicode.pod:1.5 Tue Apr 2 04:18:15 2013 +++ docs/perl/5.12.1/perlunicode.pod Sun Jul 22 19:02:26 2018 @@ -267,11 +267,12 @@ =end original -The C<use feature 'unicode_strings'> pragma is intended to always, regardless -of platform, force Unicode semantics in a particular lexical scope. In -release 5.12, it is partially implemented, applying only to case changes. -See L</The "Unicode Bug"> below. -(TBT) +C<use feature 'unicode_strings'> プラグマは、プラットフォームに関わらず常に +特定のレキシカルスコープで Unicode セマンティクスを強制することを +意図しています。 +リリース 5.12 では、これは部分的に実装されていて、大文字小文字変更にのみ +適用されます。 +後述する L</The "Unicode Bug"> を参照してください。 =begin original @@ -417,12 +418,11 @@ =end original -Alternatively, you can use the C<\x{...}> notation for characters 0x100 and -above. For characters below 0x100 you may get byte semantics instead of -character semantics; see L</The "Unicode Bug">. On EBCDIC machines there is -the additional problem that the value for such characters gives the EBCDIC -character rather than the Unicode one. -(TBT) +あるいは、0x100 以上の文字については C<\x{...}> 記法が使えます。 +0x100 より小さい文字については文字セマンティクスではなくバイトセマンティクスを +使います; L</The "Unicode Bug"> を参照してください。 +EBCDIC マシンでは、このような文字の値が Unicode のものではなく +EBCDIC のものになるという追加の問題があります。 =begin original @@ -499,11 +499,10 @@ =end original -Named Unicode properties, scripts, and block ranges may be used like -character classes via the C<\p{}> "matches property" construct and -the C<\P{}> negation, "doesn't match property". -See L</"Unicode Character Properties"> for more details. -(TBT) +名前付き Unicode 特性、用字、ブロック範囲は、 +C<\p{}> 「特性にマッチング」構文および否定である C<\P{}> +「特性にマッチングしない」を使って文字クラスのように使えます。 +さらなる詳細については L</"Unicode Character Properties"> を参照してください。 =begin original @@ -513,10 +512,10 @@ =end original -You can define your own character properties and use them -in the regular expression with the C<\p{}> or C<\P{}> construct. -See L</"User-Defined Character Properties"> for more details. -(TBT) +独自の文字特性を定義して、C<\p{}> と C<\P{}> 構文によって +正規表現でそれらを使うことができます。 +さらなる詳細については L</"User-Defined Character Properties"> を +参照してください。 =item * @@ -755,11 +754,14 @@ =end original This formality is needed when properties are not binary, that is if they can -take on more values than just True and False. For example, the Bidi_Class (see +take on more values than just True and False. +For example, the Bidi_Class (see L</"Bidirectional Character Types"> below), can take on a number of different -values, such as Left, Right, Whitespace, and others. To match these, one needs +values, such as Left, Right, Whitespace, and others. +To match these, one needs to specify the property name (Bidi_Class), and the value being matched against -(Left, Right, I<etc.>). This is done, as in the examples above, by having the +(Left, Right, I<etc.>). +This is done, as in the examples above, by having the two components separated by an equal sign (or interchangeably, a colon), like C<\p{Bidi_Class: Left}>. (TBT) @@ -803,17 +805,23 @@ Most Unicode character properties have at least two synonyms (or aliases if you prefer), a short one that is easier to type, and a longer one which is more -descriptive and hence it is easier to understand what it means. Thus the "L" -and "Letter" above are equivalent and can be used interchangeably. Likewise, +descriptive and hence it is easier to understand what it means. +Thus the "L" +and "Letter" above are equivalent and can be used interchangeably. +Likewise, "Upper" is a synonym for "Uppercase", and we could have written -C<\p{Uppercase}> equivalently as C<\p{Upper}>. Also, there are typically -various synonyms for the values the property can be. For binary properties, +C<\p{Uppercase}> equivalently as C<\p{Upper}>. +Also, there are typically +various synonyms for the values the property can be. +For binary properties, "True" has 3 synonyms: "T", "Yes", and "Y"; and "False has correspondingly "F", -"No", and "N". But be careful. A short form of a value for one property may -not mean the same thing as the same short form for another. Thus, for the -General_Category property, "L" means "Letter", but for the Bidi_Class property, -"L" means "Left". A complete list of properties and synonyms is in -L<perluniprops>. +"No", and "N". +But be careful. +A short form of a value for one property may +not mean the same thing as the same short form for another. +Thus, for the General_Category property, "L" means "Letter", +but for the Bidi_Class property, "L" means "Left". +A complete list of properties and synonyms is in L<perluniprops>. (TBT) =begin original @@ -837,14 +845,19 @@ Upper/lower case differences in the property names and values are irrelevant, thus C<\p{Upper}> means the same thing as C<\p{upper}> or even C<\p{UpPeR}>. Similarly, you can add or subtract underscores anywhere in the middle of a -word, so that these are also equivalent to C<\p{U_p_p_e_r}>. And white space -is irrelevant adjacent to non-word characters, such as the braces and the equals -or colon separators so C<\p{ Upper }> and C<\p{ Upper_case : Y }> are -equivalent to these as well. In fact, in most cases, white space and even -hyphens can be added or deleted anywhere. So even C<\p{ Up-per case = Yes}> is -equivalent. All this is called "loose-matching" by Unicode. The few places -where stricter matching is employed is in the middle of numbers, and the Perl -extension properties that begin or end with an underscore. Stricter matching +word, so that these are also equivalent to C<\p{U_p_p_e_r}>. +And white space is irrelevant adjacent to non-word characters, +such as the braces and the equals or colon separators +so C<\p{ Upper }> and C<\p{ Upper_case : Y }> are +equivalent to these as well. +In fact, in most cases, white space and even +hyphens can be added or deleted anywhere. +So even C<\p{ Up-per case = Yes}> is equivalent. +All this is called "loose-matching" by Unicode. +The few places where stricter matching is employed is +in the middle of numbers, and the Perl extension properties that +begin or end with an underscore. +Stricter matching cares about white space (except adjacent to the non-word characters) and hyphens, and non-interior underscores. (TBT) @@ -871,10 +884,9 @@ =end original -Every Unicode character is assigned a general category, which is the "most -usual categorization of a character" (from -L<http://www.unicode.org/reports/tr44>). -(TBT) +全ての Unicode 文字は一つの一般カテゴリに割り当てられています; +これは「その文字の最も普通のカテゴライズ」 +(L<http://www.unicode.org/reports/tr44> より)です。 =begin original @@ -885,11 +897,11 @@ =end original -The compound way of writing these is like C<\p{General_Category=Number}> -(short, C<\p{gc:n}>). But Perl furnishes shortcuts in which everything up -through the equal or colon separator is omitted. So you can instead just write -C<\pN>. -(TBT) +これらを書く複合的な方法は C<\p{General_Category=Number}> +(短縮形は C<\p{gc:n}>) のようなものです。 +Perl は等号またはコロンの区切り文字までの全てを省略できる機能を +提供しています。 +従って、代わりに単に C<\pN> と書けます。 =begin original @@ -1037,10 +1049,11 @@ =end original -The world's languages are written in a number of scripts. This sentence -(unless you're reading it in translation) is written in Latin, while Russian is -written in Cyrllic, and Greek is written in, well, Greek; Japanese mainly in -Hiragana or Katakana. There are many more. +The world's languages are written in a number of scripts. +This sentence (unless you're reading it in translation) is written in Latin, +while Russian is written in Cyrllic, and Greek is written in, well, Greek; +Japanese mainly in Hiragana or Katakana. +There are many more. (TBT) =begin original @@ -1055,9 +1068,10 @@ The Unicode Script property gives what script a given character is in, and can be matched with the compound form like C<\p{Script=Hebrew}> (short: -C<\p{sc=hebr}>). Perl furnishes shortcuts for all script names. You can omit -everything up through the equals (or colon), and simply write C<\p{Latin}> or -C<\P{Cyrillic}>. +C<\p{sc=hebr}>). +Perl furnishes shortcuts for all script names. +You can omit everything up through the equals (or colon), +and simply write C<\p{Latin}> or C<\P{Cyrillic}>. (TBT) =begin original @@ -1130,9 +1144,9 @@ =end original -For more about scripts versus blocks, see UAX#24 "Unicode Script Property": -L<http://www.unicode.org/reports/tr24> -(TBT) +用字とブロックに違いに関する詳細については、 +UAX#24 "Unicode Script Property" +L<http://www.unicode.org/reports/tr24> を参照してください。 =begin original @@ -1142,10 +1156,8 @@ =end original -The Script property is likely to be the one you want to use when processing -natural language; the Block property may be useful in working with the nuts and -bolts of Unicode. -(TBT) +用字特性は自然言語を処理するときにおそらく使いたいと思うようなものです; +ブロック特性は Unicode の基本的な部分で動作させるのに有用です。 =begin original @@ -1199,11 +1211,12 @@ =end original -It is unstable. A new version of Unicode may pre-empt the current meaning by -creating a property with the same name. There was a time in very early Unicode -releases when C<\p{Hebrew}> would have matched the I<block> Hebrew; now it -doesn't. -(TBT) +不安定です。 +新しいバージョンの Unicode は、同じ名前の特性を作ることで現在の意味を +変えることがあります。 +とても初期の Unicode リリースでは +C<\p{Hebrew}> がヘブライ I<ブロック> にマッチングしていた時期がありました; +今はマッチングしません。 =back @@ -1216,11 +1229,10 @@ =end original -Some people just prefer to always use C<\p{Block: foo}> and C<\p{Script: bar}> -instead of the shortcuts, for clarity, and because they can't remember the -difference between 'In' and 'Is' anyway (or aren't confident that those who -eventually will read their code will know). -(TBT) +一部の人々は、明確化のため、および 'In' と 'Is' の違いを覚えていられない +(あるいは最終的にコードを読む人々が知っているか自信がない)という理由で、 +ショートカットではなく常に C<\p{Block: foo}> や C<\p{Script: bar}> を +使うのを好みます。 =begin original @@ -1255,9 +1267,11 @@ =end original Unicode defines all its properties in the compound form, so all single-form -properties are Perl extensions. A number of these are just synonyms for the +properties are Perl extensions. +A number of these are just synonyms for the Unicode ones, but some are genunine extensions, including a couple that are in -the compound form. And quite a few of these are actually recommended by Unicode +the compound form. +And quite a few of these are actually recommended by Unicode (in L<http://www.unicode.org/reports/tr18>). (TBT) @@ -1364,20 +1378,24 @@ =end original -To understand the use of this rarely used property=value combination, it is -necessary to know some basics about decomposition. -Consider a character, say H. It could appear with various marks around it, +このめったに使われない property=value の組の使い方を理解するために、 +分解に関するいくつかの基本を知る必要があります。 +一つの文字、例えば H について考えてみます。 +It could appear with various marks around it, such as an acute accent, or a circumflex, or various hooks, circles, arrows, -I<etc.>, above, below, to one side and/or the other, I<etc.> There are many -possibilities among the world's languages. The number of combinations is -astronomical, and if there were a character for each combination, it would -soon exhaust Unicode's more than a million possible characters. So Unicode -took a different approach: there is a character for the base H, and a +I<etc.>, above, below, to one side and/or the other, I<etc.> +世界中のの言語の中では多くの可能性があります。 +組み合わせの数は天文学的で、 +and if there were a character for each combination, it would +soon exhaust Unicode's more than a million possible characters. +それで Unicode は異なる手法を取りました: +there is a character for the base H, and a character for each of the possible marks, and they can be combined variously -to get a final logical character. So a logical character--what appears to be a -single character--can be a sequence of more than one individual characters. -This is called an "extended grapheme cluster". (Perl furnishes the C<\X> -construct to match such sequences.) +to get a final logical character. +それで一つの論理文字--単一の文字として現れるもの--は +複数の独立した文字の並びになることがあります。 +これは「拡張書記素クラスタ」("extended grapheme cluster")と呼ばれます。 +(Perl はこのような並びにマッチングする C<\X> 構文を用意しています。) (TBT) =begin original @@ -1395,10 +1413,13 @@ But Unicode's intent is to unify the existing character set standards and practices, and a number of pre-existing standards have single characters that -mean the same thing as some of these combinations. An example is ISO-8859-1, +mean the same thing as some of these combinations. +An example is ISO-8859-1, which has quite a few of these in the Latin-1 range, an example being "LATIN -CAPITAL LETTER E WITH ACUTE". Because this character was in this pre-existing -standard, Unicode added it to its repertoire. But this character is considered +CAPITAL LETTER E WITH ACUTE". +Because this character was in this pre-existing +standard, Unicode added it to its repertoire. +But this character is considered by Unicode to be equivalent to the sequence consisting of first the character "LATIN CAPITAL LETTER E", then the character "COMBINING ACUTE ACCENT". (TBT) @@ -1413,9 +1434,10 @@ =end original "LATIN CAPITAL LETTER E WITH ACUTE" is called a "pre-composed" character, and -the equivalence with the sequence is called canonical equivalence. All -pre-composed characters are said to have a decomposition (into the equivalent -sequence) and the decomposition type is also called canonical. +the equivalence with the sequence is called canonical equivalence. +All pre-composed characters are said to have a decomposition +(into the equivalent sequence) and +the decomposition type is also called canonical. (TBT) =begin original @@ -1433,13 +1455,15 @@ =end original -However, many more characters have a different type of decomposition, a -"compatible" or "non-canonical" decomposition. The sequences that form these -decompositions are not considered canonically equivalent to the pre-composed -character. An example, again in the Latin-1 range, is the "SUPERSCRIPT ONE". -It is kind of like a regular digit 1, but not exactly; its decomposition -into the digit 1 is called a "compatible" decomposition, specifically a -"super" decomposition. There are several such compatibility +However, many more characters have a different type of decomposition, +a "compatible" or "non-canonical" decomposition. +The sequences that form these decompositions are not +considered canonically equivalent to the pre-composed character. +An example, again in the Latin-1 range, is the "SUPERSCRIPT ONE". +It is kind of like a regular digit 1, but not exactly; +its decomposition into the digit 1 is called a "compatible" decomposition, +specifically a "super" decomposition. +There are several such compatibility decompositions (see L<http://www.unicode.org/reports/tr44>), including one called "compat" which means some miscellaneous type of decomposition that doesn't fit into the decomposition categories that Unicode has chosen. @@ -1452,9 +1476,7 @@ =end original -Note that most Unicode characters don't have a decomposition, so their -decomposition type is "None". -(TBT) +ほとんどの Unicode 文字は分解を持たないので、それらの分解型は "None" です。 =begin original @@ -1463,9 +1485,8 @@ =end original -Perl has added the C<Non_Canonical> type, for your convenience, to mean any of -the compatibility decompositions. -(TBT) +Perl は便利なように C<Non_Canonical> 型を追加しています; +これは任意の互換分解を意味します。 =item B<C<\p{Graph}>> @@ -1476,9 +1497,8 @@ =end original -Matches any character that is graphic. Theoretically, this means a character -that on a printer would cause ink to be used. -(TBT) +任意の図形文字にマッチングします。 +理論的には、これはプリンタがインクを使うことになる文字を意味します。 =item B<C<\p{HorizSpace}>> @@ -1489,9 +1509,8 @@ =end original -This is the same as C<\h> and C<\p{Blank}>: A character that changes the -spacing horizontally. -(TBT) +これは C<\h> や C<\p{Blank}> と同じです: +スペースを垂直に変更するものです。 =item B<C<\p{In=*}>> @@ -1511,8 +1530,7 @@ =end original -This is the same as C<\s>, restricted to ASCII, namely C<S<[ \f\n\r\t]>>. -(TBT) +これは C<\s> と同じで、ASCII に制限されます; つまり C<S<[ \f\n\r\t]>> です。 =begin original @@ -1530,8 +1548,7 @@ =end original -This is the same as C<\w>, restricted to ASCII, namely C<[A-Za-z0-9_]> -(TBT) +これは C<\w> と同じで ASCII に制限されます; つまり C<[A-Za-z0-9_]> です。 =begin original @@ -1550,9 +1567,8 @@ =end original -This matches any alphanumeric character in the ASCII range, namely -C<[A-Za-z0-9]>. -(TBT) +これは ASCII の範囲の任意の英数字にマッチングします; つまり +C<[A-Za-z0-9]> です。 =item B<C<\p{PosixAlpha}>> @@ -1562,8 +1578,8 @@ =end original -This matches any alphabetic character in the ASCII range, namely C<[A-Za-z]>. -(TBT) +これは ASCII の範囲の任意の英字にマッチングします; つまり +C<[A-Za-z]> です。 =item B<C<\p{PosixBlank}>> @@ -1573,8 +1589,8 @@ =end original -This matches any blank character in the ASCII range, namely C<S<[ \t]>>. -(TBT) +これは ASCII の範囲の任意の空白文字にマッチングします; つまり +C<S<[ \t]>> です。 =item B<C<\p{PosixCntrl}>> @@ -1584,8 +1600,8 @@ =end original -This matches any control character in the ASCII range, namely C<[\x00-\x1F\x7F]> -(TBT) +これは ASCII の範囲の任意の制御文字にマッチングします; つまり +C<[\x00-\x1F\x7F]> です。 =item B<C<\p{PosixDigit}>> @@ -1595,8 +1611,8 @@ =end original -This matches any digit character in the ASCII range, namely C<[0-9]>. -(TBT) +これは ASCII の範囲の任意の数字にマッチングします; つまり +C<[0-9]> です。 =item B<C<\p{PosixGraph}>> @@ -1606,8 +1622,8 @@ =end original -This matches any graphical character in the ASCII range, namely C<[\x21-\x7E]>. -(TBT) +これは ASCII の範囲の任意の図形文字にマッチングします; つまり +C<[\x21-\x7E]> です。 =item B<C<\p{PosixLower}>> @@ -1617,8 +1633,8 @@ =end original -This matches any lowercase character in the ASCII range, namely C<[a-z]>. -(TBT) +これは ASCII の範囲の任意の小文字にマッチングします; つまり +C<[a-z]> です。 =item B<C<\p{PosixPrint}>> @@ -1629,9 +1645,9 @@ =end original -This matches any printable character in the ASCII range, namely C<[\x20-\x7E]>. -These are the graphical characters plus SPACE. -(TBT) +これは ASCII の範囲の任意の表示文字にマッチングします; つまり +C<[\x20-\x7E]> です。 +これは図形文字に SPACE を加えたものです。 =item B<C<\p{PosixPunct}>> @@ -1645,12 +1661,11 @@ =end original -This matches any punctuation character in the ASCII range, namely -C<[\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]>. These are the -graphical characters that aren't word characters. Note that the Posix standard -includes in its definition of punctuation, those characters that Unicode calls -"symbols." -(TBT) +これは ASCII の範囲の任意の句読点文字にマッチングします; つまり +C<[\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]> です。 +これらは単語文字でない図形文字です。 +Posix 標準は句読点の定義を含んでいて、Unicode はこれらの文字を +「シンボル」と呼んでいることに注意してください。 =item B<C<\p{PosixSpace}>> @@ -1661,9 +1676,8 @@ =end original -This matches any space character in the ASCII range, namely -C<S<[ \f\n\r\t\x0B]>> (the last being a vertical tab). -(TBT) +これは ASCII の範囲の任意の空白文字にマッチングします; つまり +C<S<[ \f\n\r\t\x0B]>> です (最後のものは垂直タブです)。 =item B<C<\p{PosixUpper}>> @@ -1673,8 +1687,7 @@ =end original -This matches any uppercase character in the ASCII range, namely C<[A-Z]>. -(TBT) +これは ASCII の範囲の任意の大文字にマッチングします; つまり C<[A-Z]> です。 =item B<C<\p{Present_In: *}>> (Short: C<\p{In=*}>) @@ -1685,9 +1698,7 @@ =end original -This property is used when you need to know in what Unicode version(s) a -character is. -(TBT) +この特性は、この文字の Unicode バージョンを知る必要があるときに使われます。 =begin original @@ -1699,10 +1710,12 @@ =end original -The "*" above stands for some two digit Unicode version number, such as -C<1.1> or C<4.0>; or the "*" can also be C<Unassigned>. This property will +前述の "*" は、C<1.1> や C<4.0> のような 2 桁の Unicode バージョン番号です; +あるいは "*" は C<Unassigned> も取ります。 +This property will match the code points whose final disposition has been settled as of the -Unicode release given by the version number; C<\p{Present_In: Unassigned}> +Unicode release given by the version number; +C<\p{Present_In: Unassigned}> will match those code points whose meaning has yet to be assigned. (TBT) @@ -1717,10 +1730,12 @@ =end original For example, C<U+0041> "LATIN CAPITAL LETTER A" was present in the very first -Unicode release available, which is C<1.1>, so this property is true for all -valid "*" versions. On the other hand, C<U+1EFF> was not assigned until version -5.1 when it became "LATIN SMALL LETTER Y WITH LOOP", so the only "*" that -would match it are 5.1, 5.2, and later. +Unicode release available, which is C<1.1>, +so this property is true for all +valid "*" versions. +On the other hand, C<U+1EFF> was not assigned until version +5.1 when it became "LATIN SMALL LETTER Y WITH LOOP", +so the only "*" that would match it are 5.1, 5.2, and later. (TBT) =begin original @@ -1733,11 +1748,12 @@ =end original -Unicode furnishes the C<Age> property from which this is derived. The problem -with Age is that a strict interpretation of it (which Perl takes) has it -matching the precise release a code point's meaning is introduced in. Thus -C<U+0041> would match only 1.1; and C<U+1EFF> only 5.1. This is not usually what -you want. +Unicode furnishes the C<Age> property from which this is derived. +The problem with Age is that a strict interpretation of it +(which Perl takes) has +it matching the precise release a code point's meaning is introduced in. +Thus C<U+0041> would match only 1.1; and C<U+1EFF> only 5.1. +This is not usually what you want. (TBT) =begin original @@ -1766,9 +1782,11 @@ Another confusion with both these properties is that the definition is not that the code point has been assigned, but that the meaning of the code point -has been determined. This is because 66 code points will always be +has been determined. +This is because 66 code points will always be unassigned, and, so the Age for them is the Unicode version the decision to -make them so was made in. For example, C<U+FDD0> is to be permanently +make them so was made in. +For example, C<U+FDD0> is to be permanently unassigned to a character, and the decision to do that was made in version 3.1, so C<\p{Age=3.1}> matches this character and C<\p{Present_In: 3.1}> and up matches as well. @@ -1782,8 +1800,7 @@ =end original -This matches any character that is graphical or blank, except controls. -(TBT) +制御文字を除く、任意の図形文字か空白にマッチングします。 =item B<C<\p{SpacePerl}>> @@ -1793,8 +1810,7 @@ =end original -This is the same as C<\s>, including beyond ASCII. -(TBT) +これは C<\s> は同様で、ASCII の範囲外を含みます。 =begin original @@ -1803,9 +1819,8 @@ =end original -Mnemonic: Space, as modified by Perl. (It doesn't include the vertical tab -which both the Posix standard and Unicode consider to be space.) -(TBT) +記憶法: スペース、Perl によって修正。 +(これは、Posix 標準と Unicode の両方が空白と考える垂直タブを含みません。) =item B<C<\p{VertSpace}>> @@ -1815,8 +1830,7 @@ =end original -This is the same as C<\v>: A character that changes the spacing vertically. -(TBT) +これは C<\v> と同じです: 垂直の空白を変更する文字です。 =item B<C<\p{Word}>> @@ -1826,8 +1840,7 @@ =end original -This is the same as C<\w>, including beyond ASCII. -(TBT) +これは C<\w> と同じで、ASCII 範囲外を含みます。 =back @@ -2168,8 +2181,7 @@ =end original -The mappings are in effect for the package they are defined in. -(TBT) +マッピングは定義したパッケージに対して有効です。 =head2 Character Encodings for Input and Output @@ -3049,10 +3061,9 @@ =end original -Changing the case of a scalar, that is, using C<uc()>, C<ucfirst()>, C<lc()>, -and C<lcfirst()>, or C<\L>, C<\U>, C<\u> and C<\l> in regular expression -substitutions. -(TBT) +スカラの大文字小文字を変える; つまり、C<uc()>, C<ucfirst()>, C<lc()>, +C<lcfirst()> を使ったり、正規表現置換の中で C<\L>, C<\U>, C<\u>, C<\l> を +使う。 =item * @@ -3062,8 +3073,7 @@ =end original -Using caseless (C</i>) regular expression matching -(TBT) +大文字小文字を無視した (C</i>) 正規表現マッチングを使う =item * @@ -3073,8 +3083,7 @@ =end original -Matching a number of properties in regular expressions, such as C<\w> -(TBT) +正規表現中に C<\w> のような、多くの特性を使う =item * @@ -3086,9 +3095,11 @@ =end original -User-defined case change mappings. You can create a C<ToUpper()> function, for -example, which overrides Perl's built-in case mappings. The scalar must be -encoded in utf8 for your function to actually be invoked. +ユーザー定義の大文字小文字を変えるマッピング。 +You can create a C<ToUpper()> function, +for example, +which overrides Perl's built-in case mappings. +The scalar must be encoded in utf8 for your function to actually be invoked. (TBT) =back @@ -3139,8 +3150,8 @@ This anomaly stems from Perl's attempt to not disturb older programs that didn't use Unicode, and hence had no semantics for characters outside of the ASCII range (except in a locale), along with Perl's desire to add Unicode -support seamlessly. The result wasn't seamless: these characters were -orphaned. +support seamlessly. +The result wasn't seamless: these characters were orphaned. (TBT) =begin original @@ -3154,12 +3165,13 @@ =end original -Work is being done to correct this, but only some of it was complete in time -for the 5.12 release. What has been finished is the important part of the case -changing component. Due to concerns, and some evidence, that older code might +Work is being done to correct this, +but only some of it was complete in time for the 5.12 release. +What has been finished is the important part of the case changing component. +Due to concerns, and some evidence, that older code might have come to rely on the existing behavior, the new behavior must be explicitly -enabled by the feature C<unicode_strings> in the L<feature> pragma, even though -no new syntax is involved. +enabled by the feature C<unicode_strings> in the L<feature> pragma, +even though no new syntax is involved. (TBT) =begin original @@ -3189,6 +3201,7 @@ 今のところの回避方法は 常に utf8::upgrade($string) を呼び出すか標準モジュール L<Encode> を 使うことです。 +また、基数が 0x100 以上の文字を持つスカラや、 Also, a scalar that has any characters whose ordinal is above 0x100, or which were specified using either of the C<\N{...}> notations will automatically have character semantics. @@ -3230,9 +3243,8 @@ =end original -Calling either function on a string that already is in the desired state is a -no-op. -(TBT) +既に望み通りの状態になっている文字列に対してこれらの関数を呼び出しても、 +何も起こりません。 =head2 Using Unicode in XS @@ -3506,9 +3518,10 @@ =end original Download the files in the version of Unicode that you want from the Unicode web -site L<http://www.unicode.org>). These should replace the existing files in -C<\$Config{privlib}>/F<unicore>. (C<\%Config> is available from the Config -module.) Follow the instructions in F<README.perl> in that directory to change +site L<http://www.unicode.org>). +These should replace the existing files in C<\$Config{privlib}>/F<unicore>. +(C<\%Config> is available from the Config module.) +Follow the instructions in F<README.perl> in that directory to change some of their names, and then run F<make>. (TBT) @@ -3583,12 +3596,11 @@ =end original -There are problems with case-insensitive matches, including those involving -character classes (enclosed in [square brackets]), characters whose fold -is to multiple characters (such as the single character LATIN SMALL LIGATURE -FFL matches case-insensitively with the 3-character string C<ffl>), and -characters in the Latin-1 Supplement. -(TBT) +大文字小文字を無視したマッチングには問題があります; +(大かっこで囲まれた) 文字クラスに関するもの、 +畳み込まれる文字が複数の文字になるもの (単一の文字 LATIN SMALL LIGATURE +FFL が 3 文字の文字列 C<ffl> に大文字小文字を無視してマッチングするようなもの) +Latin-1 Supplement にある文字に関するものなどです。 =head2 Interaction with Extensions