Date: Monday December 23, 2019 @ 04:04 Author: argrath Update of /cvsroot/perldocjp/docs/perl/5.14.1 In directory sf-cvs:/tmp/cvs-serv96145/perl/5.14.1 Modified Files: perlre.pod Log Message: 5.14.1/perlre =================================================================== File: perlre.pod Status: Up-to-date Working revision: 1.11 Sun Dec 22 19:04:44 2019 Repository revision: 1.11 /cvsroot/perldocjp/docs/perl/5.14.1/perlre.pod,v Sticky Options: -ko Existing Tags: No Tags Exist -------------- next part -------------- Index: docs/perl/5.14.1/perlre.pod diff -u docs/perl/5.14.1/perlre.pod:1.10 docs/perl/5.14.1/perlre.pod:1.11 --- docs/perl/5.14.1/perlre.pod:1.10 Sun Jul 22 19:01:29 2018 +++ docs/perl/5.14.1/perlre.pod Mon Dec 23 04:04:44 2019 @@ -10,7 +10,7 @@ =end original -perlre - Perl Àµµ¬É½¸½ +perlre - Perl ¤ÎÀµµ¬É½¸½ =head1 DESCRIPTION @@ -524,20 +524,21 @@ =end original -Unlike most locales, which are specific to a language and country pair, -Unicode classifies all the characters that are letters I<somewhere> as -C<\w>. For example, your locale might not think that C<LATIN SMALL -LETTER ETH> is a letter (unless you happen to speak Icelandic), but -Unicode does. Similarly, all the characters that are decimal digits -somewhere in the world will match C<\d>; this is hundreds, not 10, -possible matches. And some of those digits look like some of the 10 -ASCII digits, but mean a different number, so a human could easily think -a number is a different quantity than it really is. For example, -C<BENGALI DIGIT FOUR> (U+09EA) looks very much like an -C<ASCII DIGIT EIGHT> (U+0038). And, C<\d+>, may match strings of digits -that are a mixture from different writing systems, creating a security -issue. L<Unicode::UCD/num()> can be used to sort this out. -(TBT) +ÆÃÄê¤Î¸À¸ì¤È¹ñ¤Ë¸ÇͤǤ¢¤ë¤Û¤È¤ó¤É¤Î¥í¥±¡¼¥ë¤È°Û¤Ê¤ê¡¢ +Unicode ¤Ï I<¤É¤³¤«> ¤Ç»ú(letter)¤È¤·¤Æ°·¤ï¤ì¤Æ¤¤¤ëÁ´¤Æ¤Îʸ»ú(character)¤ò +C<\w> ¤ËʬÎष¤Þ¤¹¡£ +Î㤨¤Ð¡¢¤¢¤Ê¤¿¤Î¥í¥±¡¼¥ë¤Ï (¤¢¤Ê¤¿¤¬¤¿¤Þ¤¿¤Þ¥¢¥¤¥¹¥é¥ó¥É¸ì¤òÏ䵤ʤ¤¸Â¤ê) +C<LATIN SMALL LETTER ETH> ¤ò»ú¤È¤·¤Æ¹Í¤¨¤Ê¤¤¤«¤â¤·¤ì¤Þ¤»¤ó¡£ +ƱÍͤˡ¢À¤³¦¤Î¤É¤³¤«¤Ç¿ô»ú¤Ç¤¢¤ëÁ´¤Æ¤Îʸ»ú¤Ï C<\d> ¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹; +¤³¤ì¤Ï 10 ¤Ç¤Ï¤Ê¤¯ 100 ¤Î¥Þ¥Ã¥Á¥ó¥°¤Î²ÄǽÀ¤¬¤¢¤ê¤Þ¤¹¡£ +¤µ¤é¤Ë¤³¤ì¤é¤Î¿ô»ú¤Î°ìÉô¤Ï 10 ¤Î ASCII ¿ô»ú¤È»÷¤Æ¤¤¤Þ¤¹¤¬¡¢ +°Û¤Ê¤ë¿ô»ú¤ò°ÕÌ£¤¹¤ë¤¿¤á¡¢¿Í´Ö¤Ï¤½¤Î¿ô»ú¤¬¼ÂºÝ¤È°Û¤Ê¤ëÎ̤Ǥ¢¤ë¤È +´Êñ¤Ë¹Í¤¨¤Æ¤·¤Þ¤¤¤Þ¤¹¡£ +Î㤨¤Ð¡¢ C<BENGALI DIGIT FOUR> (U+09EA) ¤Ï C<ASCII DIGIT EIGHT> (U+0038) ¤Ë +¤È¤Æ¤â¤è¤¯»÷¤Æ¤¤¤Þ¤¹¡£ +C<\d+> ¤Ï¡¢°Û¤Ê¤ëµË¡¤«¤éº®¤¼¤¿¿ô»ú¤Îʸ»úÎó¤Ë¥Þ¥Ã¥Á¥ó¥°¤¹¤ë¤Î¤Ç¡¢ +¥»¥¥å¥ê¥Æ¥£¾å¤ÎÌäÂê¤òºî¤ê¤Þ¤¹¡£ +¤³¤ì¤òÀ°Íý¤¹¤ë¤¿¤á¤Ë L<Unicode::UCD/num()> ¤¬»È¤ï¤ì¤Þ¤¹¡£ =begin original @@ -551,14 +552,13 @@ =end original -Also, case-insensitive matching works on the full set of Unicode -characters. The C<KELVIN SIGN>, for example matches the letters "k" and -"K"; and C<LATIN SMALL LIGATURE FF> matches the sequence "ff", which, -if you're not prepared, might make it look like a hexadecimal constant, -presenting another potential security issue. See -L<http://unicode.org/reports/tr36> for a detailed discussion of Unicode -security issues. -(TBT) +¤Þ¤¿¡¢Âçʸ»ú¾®Ê¸»ú¤ò̵»ë¤·¤¿¥Þ¥Ã¥Á¥ó¥°¤Ï Unicode ¤Î´°Á´¤Ê½¸¹ç¤ÇÆ°ºî¤·¤Þ¤¹¡£ +Î㤨¤Ð C<KELVIN SIGN> ¤Ï "k" ¤È "K" ¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹; +C<LATIN SMALL LIGATURE FF> ¤Ï¡¢½àÈ÷¤·¤Æ¤¤¤Ê¤±¤ì¤Ð 16 ¿Ê¿ôÄê¿ô¤Î¤è¤¦¤Ë +¸«¤¨¤ë¤«¤â¤·¤ì¤Ê¤¤ÊÂ¤Ó "ff" ¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¡¢¤â¤¦°ì¤Ä¤ÎÀøºßŪ¤Ê +¥»¥¥å¥ê¥Æ¥£ÌäÂê¤Ë¤Ê¤ê¤Þ¤¹¡£ +Unicode ¤Î¥»¥¥å¥ê¥Æ¥£ÌäÂê¤Ë´Ø¤¹¤ë¾ÜºÙ¤ÊµÄÏÀ¤Ë¤Ä¤¤¤Æ¤Ï +L<http://unicode.org/reports/tr36> ¤ò»²¾È¤·¤Æ¤¯¤À¤µ¤¤¡£ =begin original @@ -572,14 +572,15 @@ =end original -On the EBCDIC platforms that Perl handles, the native character set is -equivalent to Latin-1. Thus this modifier changes behavior only when -the C<"/i"> modifier is also specified, and it turns out it affects only -two characters, giving them full Unicode semantics: the C<MICRO SIGN> -will match the Greek capital and small letters C<MU>, otherwise not; and -the C<LATIN CAPITAL LETTER SHARP S> will match any of C<SS>, C<Ss>, -C<sS>, and C<ss>, otherwise not. -(TBT) +Perl ¤¬°·¤¨¤ë EBCDIC ¥×¥é¥Ã¥È¥Õ¥©¡¼¥à¤Ç¤Ï¡¢¥Í¥¤¥Æ¥£¥Ö¤Êʸ»ú½¸¹ç¤Ï +Latin-1 ¤ÈÅù²Á¤Ç¤¹¡£ +½¾¤Ã¤Æ¤³¤Î½¤¾þ»Ò¤Ï¡¢C<"/i"> ½¤¾þ»Ò¤â»ØÄꤵ¤ì¤¿¤È¤¤Ë¤Î¤ß +¿¶¤ëÉñ¤¤¤òÊѤ¨¡¢·ë²Ì¤È¤·¤ÆÆó¤Ä¤Îʸ»ú¤Ë¤À¤±±Æ¶Á¤òÍ¿¤¨¡¢ +´°Á´¤Ê Unicode ¤Î°ÕÌ£ÏÀ¤òÍ¿¤¨¤Þ¤¹: +C<MICRO SIGN> ¤Ï¥®¥ê¥·¥ã¸ì¤ÎÂçʸ»ú¤È¾®Ê¸»ú¤Î C<MU> ¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¡¢ +¤½¤ì°Ê³°¤Ï¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤»¤ó; +¤Þ¤¿ C<LATIN CAPITAL LETTER SHARP S> ¤Ï C<SS>, C<Ss>, +C<sS>, C<ss> ¤Î¤¤¤º¤ì¤«¤Ë¤Ï¥Þ¥Ã¥Á¥ó¥°¤·¡¢¤½¤ì°Ê³°¤Ë¤Ï¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤»¤ó¡£ =begin original @@ -611,12 +612,12 @@ ¤³¤ì¤Ï C</u> ¤ÈƱÍͤǤ¹¤¬¡¢C<\d>, C<\s>, C<\w>, Posix ʸ»ú¥¯¥é¥¹¤Ï ASCII ¤ÎÈϰϤΤߤ˥ޥåÁ¥ó¥°¤¹¤ë¤è¤¦¤ËÀ©¸Â¤µ¤ì¤Þ¤¹¡£ -That is, with this modifier, C<\d> always means precisely the -digits C<"0"> to C<"9">; C<\s> means the five characters C<[ \f\n\r\t]>; -C<\w> means the 63 characters C<[A-Za-z0-9_]>; and likewise, all the -Posix classes such as C<[[:print:]]> match only the appropriate -ASCII-range characters. -(TBT) +¤Ä¤Þ¤ê¡¢¤³¤Î½¤¾þ»Ò¤ò»È¤¦¤È¡¢C<\d> ¤Ï¾ï¤ËÀµ³Î¤Ë¿ô»ú C<"0"> ¤«¤é C<"9"> ¤ò +°ÕÌ£¤·¤Þ¤¹; +C<\s> ¤Ï C<[ \f\n\r\t]> ¤Î 5 ʸ»ú¤ò°ÕÌ£¤·¤Þ¤¹; +C<\w> ¤Ï C<[A-Za-z0-9_]> ¤Î 63 ʸ»ú¤ò°ÕÌ£¤·¤Þ¤¹; +ƱÍͤˡ¢C<[[:print:]]> ¤Î¤è¤¦¤ÊÁ´¤Æ¤Î Posix ¥¯¥é¥¹¤Ï +ŬÀÚ¤Ê ASCII ¤ÎÈϰϤÎʸ»ú¤Ë¤Î¤ß¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹¡£ =begin original @@ -629,13 +630,13 @@ =end original -This modifier is useful for people who only incidentally use Unicode. -With it, one can write C<\d> with confidence that it will only match -ASCII characters, and should the need arise to match beyond ASCII, you -can use C<\p{Digit}>, or C<\p{Word}> for C<\w>. There are similar -C<\p{...}> constructs that can match white space and Posix classes -beyond ASCII. See L<perlrecharclass/POSIX Character Classes>. -(TBT) +¤³¤Î½¤¾þ»Ò¤Ï¡¢¶öÁ³ Unicode ¤ò»È¤Ã¤Æ¤¤¤ë¿Í¡¹¤Ë¤È¤Ã¤Æ¤ÏÍÍѤǤ¹¡£ +¤³¤ì¤ò»È¤¦¤È¡¢ASCII ʸ»ú¤À¤±¤Ë¥Þ¥Ã¥Á¥ó¥°¤¹¤ë¤³¤È¤Ë¼«¿®¤ò»ý¤Ã¤Æ +½ñ¤¯¤³¤È¤¬¤Ç¤¡¢ASCII ¤òĶ¤¨¤Æ¥Þ¥Ã¥Á¥ó¥°¤¹¤ëɬÍפ¬È¯À¸¤·¤¿¤È¤¤Ë¤Ï¡¢ +C<\w> ¤È¤·¤Æ C<\p{Digit}> ¤ä C<\p{Word}> ¤¬»È¤¨¤Þ¤¹¡£ +ASCII ¤òĶ¤¨¤¿¥¹¥Ú¡¼¥¹¤È Posix ¥¯¥é¥¹¤Ë¥Þ¥Ã¥Á¥ó¥°¤¹¤ë +»÷¤¿¤è¤¦¤Ê C<\p{...}> ¹½Ê¸¤¬¤¢¤ê¤Þ¤¹¡£ +L<perlrecharclass/POSIX Character Classes> ¤ò»²¾È¤·¤Æ¤¯¤À¤µ¤¤¡£ =begin original @@ -647,12 +648,11 @@ =end original -As you would expect, this modifier causes, for example, C<\D> to mean -the same thing as C<[^0-9]>; in fact, all non-ASCII characters match -C<\D>, C<\S>, and C<\W>. C<\b> still means to match at the boundary -between C<\w> and C<\W>, using the C</a> definitions of them (similarly -for C<\B>). -(TBT) +ͽÁۤǤ¤ë¤È¤ª¤ê¡¢ ¤³¤Î½¤¾þ»Ò¤Ï¡¢Î㤨¤Ð¡¢C<\D> ¤ò C<[^0-9]> ¤È +Ʊ¤¸¤³¤È¤Ë¤·¤Þ¤¹; +¼ÂºÝ¡¢Á´¤Æ¤ÎÈó ASCII ʸ»ú¤Ï C<\D>, C<\S>, C<\W> ¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹¡£ +C<\b> ¤Ï¤Þ¤À C<\w> ¤È C<\W> ¤Î¶³¦¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹; +¤³¤ì¤é¤Î¤¿¤á¤Ë (C<\B> ¤ÈƱÍÍ) C</a> ¤ÎÄêµÁ¤ò»È¤¤¤Þ¤¹¡£ =begin original @@ -664,12 +664,11 @@ =end original -Otherwise, C</a> behaves like the C</u> modifier, in that -case-insensitive matching uses Unicode semantics; for example, "k" will -match the Unicode C<\N{KELVIN SIGN}> under C</i> matching, and code -points in the Latin1 range, above ASCII will have Unicode rules when it -comes to case-insensitive matching. -(TBT) +¤µ¤â¤Ê¤±¤ì¤Ð¡¢C</a> ¤Ï C</u> ½¤¾þ»Ò¤Î¤è¤¦¤Ë¿¶¤ëÉñ¤¤¤Þ¤¹; +Âçʸ»ú¾®Ê¸»ú¤ò̵»ë¤·¤¿¥Þ¥Ã¥Á¥ó¥°¤Ë¤Ï Unicode ¤Î°ÕÌ£ÏÀ¤ò»È¤¤¤Þ¤¹; +Î㤨¤Ð¡¢"k" ¤Ï C</i> ¤Î´ð¤Ç¤Ï C<\N{KELVIN SIGN}> ¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¡¢ +ASCII ¤ÎÈϰϤòĶ¤¨¤ë Latin1 ¤ÎÈϰϤÎÉä¹æ°ÌÃ֤ϡ¢Âçʸ»ú¾®Ê¸»ú¤ò̵»ë¤·¤¿ +¥Þ¥Ã¥Á¥ó¥°¤Ç»È¤ï¤ì¤ë¾ì¹ç¤Ï Unicode ¤Îµ¬Â§¤ò»È¤¤¤Þ¤¹¡£ =begin original @@ -832,14 +831,12 @@ =end original -Which of these modifiers is in effect at any given point in a regular -expression depends on a fairly complex set of interactions. As -explained below in L</Extended Patterns> it is possible to explicitly -specify modifiers that apply only to portions of a regular expression. -The innermost always has priority over any outer ones, and one applying -to the whole expression has priority over any of the default settings that are -described in the remainder of this section. -(TBT) +¤¢¤ëÀµµ¬É½¸½¤Î¤¢¤ë¥Ý¥¤¥ó¥È¤Ç¤É¤Î½¤¾þ»Ò¤¬Í¸ú¤«¤Ï¡¢¤«¤Ê¤êÊ£»¨¤ÊÁê¸ßºîÍÑ¤Ë +°Í¸¤·¤Þ¤¹¡£ +L</Extended Patterns> ¤Ç¸å½Ò¤¹¤ë¤È¤ª¤ê¡¢Àµµ¬É½¸½¤Î°ìÉô¤Ë¤À¤± +ŬÍѤ¹¤ë½¤¾þ»Ò¤òÌÀ¼¨Åª¤Ë»ØÄꤹ¤ë¤³¤È¤¬²Äǽ¤Ç¤¹¡£ +°ìÈÖÆ⦤Τâ¤Î¤Ï¾ï¤Ë¤è¤ê³°Â¦¤Î¤â¤Î¤è¤êÍ¥À褵¤ì¡¢¼°Á´ÂΤËŬÍѤµ¤ì¤ë¤â¤Î¤Ï +¤³¤ÎÀá¤Î»Ä¤ê¤Çµ½Ò¤µ¤ì¤ë¥Ç¥Õ¥©¥ë¥ÈÀßÄê¤è¤êÍ¥À褵¤ì¤Þ¤¹¡£ =begin original @@ -850,11 +847,10 @@ =end original -The C<L<use re 'E<sol>foo'|re/'E<sol>flags' mode">> pragma can be used to set -default modifiers (including these) for regular expressions compiled -within its scope. This pragma has precedence over the other pragmas -listed below that change the defaults. -(TBT) +C<L<use re 'E<sol>foo'|re/'E<sol>flags' mode">> ¥×¥é¥°¥Þ¤Ï¡¢ +¤³¤Î¥¹¥³¡¼¥×Æâ¤Ç¥³¥ó¥Ñ¥¤¥ë¤µ¤ì¤ëÀµµ¬É½¸½¤ËÂФ·¤Æ(¤³¤ì¤é¤ò´Þ¤à) +¥Ç¥Õ¥©¥ë¥È¤Î½¤¾þ»Ò¤òÀßÄꤹ¤ë¤Î¤Ë»È¤¨¤Þ¤¹¡£ +¤³¤Î¥×¥é¥°¥Þ¤Ï¡¢¥Ç¥Õ¥©¥ë¥È¤òÊѹ¹¤¹¤ë¸å½Ò¤¹¤ë¤½¤Î¾¤Î¥×¥é¥°¥Þ¤ËÍ¥À褷¤Þ¤¹¡£ =begin original @@ -869,15 +865,14 @@ =end original -Otherwise, C<L<use locale|perllocale>> sets the default modifier to C</l>; -and C<L<use feature 'unicode_strings|feature>> or -C<L<use 5.012|perlfunc/use VERSION>> (or higher) set the default to -C</u> when not in the same scope as either C<L<use locale|perllocale>> -or C<L<use bytes|bytes>>. Unlike the mechanisms mentioned above, these -affect operations besides regular expressions pattern matching, and so -give more consistent results with other operators, including using -C<\U>, C<\l>, etc. in substitution replacements. -(TBT) +¤µ¤â¤Ê¤±¤ì¤Ð¡¢C<L<use locale|perllocale>> ¤Ï¥Ç¥Õ¥©¥ë¥È½¤¾þ»Ò¤ò C</l> ¤Ë +ÀßÄꤷ¤Þ¤¹; ¤½¤·¤Æ¡¢C<L<use feature 'unicode_strings|feature>> ¤« +C<L<use 5.012|perlfunc/use VERSION>> (¤Þ¤¿¤Ï¤½¤ì°Ê¾å) ¤Ï¡¢ +Ʊ¤¸¥¹¥³¡¼¥×¤Ë C<L<use locale|perllocale>> ¤ä C<L<use bytes|bytes>> ¤¬ +¤Ê¤±¤ì¤Ð¡¢¥Ç¥Õ¥©¥ë¥È¤ò C</u> ¤ËÀßÄꤷ¤Þ¤¹¡£ +Á°½Ò¤·¤¿µ¡¹½¤È°Û¤Ê¤ê¡¢¤³¤ì¤é¤ÏÀµµ¬É½¸½¥Ñ¥¿¡¼¥ó¥Þ¥Ã¥Á¥ó¥°°Ê³°¤ÎÁàºî¤Ë +±Æ¶Á¤¹¤ë¤Î¤Ç¡¢ÃÖ´¹¤Ç¤Î C<\U>, C<\l> ¤ò»È¤¦¤³¤È¤ò´Þ¤à¤½¤Î¾¤ÎÁàºî¤È +¤è¤ê°ì´ÓÀ¤Î¤¢¤ë·ë²Ì¤Ë¤Ê¤ê¤Þ¤¹¡£ =begin original @@ -888,11 +883,10 @@ =end original -If none of the above apply, for backwards compatibility reasons, the -C</d> modifier is the one in effect by default. As this can lead to -unexpected results, it is best to specify which other rule set should be -used. -(TBT) +Á°½Ò¤Î¤É¤ì¤âŬÍѤµ¤ì¤Ê¤¤¾ì¹ç¡¢¸åÊý¸ß´¹À¤Î¤¿¤á¤Ë¡¢ +C</d> ½¤¾þ»Ò¤¬¥Ç¥Õ¥©¥ë¥È¤Ç͸ú¤Ë¤Ê¤ê¤Þ¤¹¡£ +¤³¤ì¤ÏÁÛÄê³°¤Î·ë²Ì¤Ë¤Ê¤ë¤³¤È¤¬¤¢¤ë¤Î¤Ç¡¢ +¤½¤Î¾¤Îµ¬Â§½¸¹ç¤¬»È¤ï¤ì¤ë¤è¤¦¤Ë»ØÄꤹ¤ë¤Î¤¬ºÇÎɤǤ¹¡£ =head4 Character set modifier behavior prior to Perl 5.14