rune.3 (3360B)
1 .Dd $Mdocdate$ 2 .Dt RUNE 3 3 .Os 4 .Sh NAME 5 .Nm runetochar, chartorune, charntorune, runelen, runenlen, fullrune, utfecpy, utflen, utfnlen, utfrune, utfrrune, utfutf 6 .Nd UTF-8 rune conversion 7 .Sh SYNOPSIS 8 .In utf.h 9 .Ft int 10 .Fn runetochar "char *s" "Rune *p" 11 .Ft int 12 .Fn chartorune "Rune *p" "char *s" 13 .Ft int 14 .Fn charntorune "Rune *p" "char *s" "size_t len" 15 .Ft int 16 .Fn runelen "Rune r" 17 .Ft int 18 .Fn runenlen "Rune *p" "size_t len" 19 .Ft int 20 .Fn fullrune "char *s" "size_t len" 21 .Ft char * 22 .Fn utfecpy "char *to" "char *end" "char *from" 23 .Ft size_t 24 .Fn utflen "char *s" 25 .Ft size_t 26 .Fn utfnlen "char *s" "size_t len" 27 .Ft char * 28 .Fn utfrune "char *s" "Rune r" 29 .Ft char * 30 .Fn utfrrune "char *s" "Rune r" 31 .Ft char * 32 .Fn utfutf "char *s" "char *t" 33 .Sh DESCRIPTION 34 The following functions convert to and from a UTF-8 byte stream and Unicode runes. 35 .Pp 36 .Fn runetochar 37 converts one rune at 38 .Fa p 39 to at most 40 .Dv UTFmax 41 bytes starting at 42 .Fa s , 43 and returns the number of bytes copied. 44 .Dv UTFmax 45 is the maximum number of bytes required to represent a rune. 46 If the rune is illegal, 47 .Fn runetochar 48 will return 0. 49 .Pp 50 .Fn chartorune 51 converts at most 52 .Dv UTFmax 53 bytes starting at 54 .Fa s 55 to one rune at 56 .Fa p , 57 and returns the number of bytes copied. 58 If the input is invalid UTF-8, 59 .Fn chartorune 60 will convert the sequence to 61 .Dv Runeerror 62 (0xFFFD) and return the number of bytes in the invalid sequence. 63 .Pp 64 .Fn charntorune 65 converts at most 66 .Fa len 67 bytes starting at 68 .Fa s 69 to one rune at 70 .Fa p , 71 and returns the number of bytes copied. 72 If the next sequence is longer than 73 .Fa len 74 bytes, 75 .Fn charntorune 76 will return 0. 77 .Pp 78 .Fn runelen 79 returns the number of bytes required to convert the rune 80 .Fa r 81 into UTF-8. 82 If the rune is illegal, 83 .Fn runelen 84 will return 0. 85 .Pp 86 .Fn runenlen 87 returns the number of bytes required to convert the 88 .Fa len 89 runes pointed to by 90 .Fa p 91 into UTF-8. 92 .Pp 93 .Fn fullrune 94 returns 1 if the first 95 .Fa len 96 bytes of the UTF-8 string 97 .Fa s 98 form a complete rune, and 0 otherwise. 99 .Pp 100 The following functions are analogous to the corresponding string routines, with `utf' substituted for `str', and `rune' for `chr'. 101 .Pp 102 .Fn utfecpy 103 copies UTF-8 sequences until a nul byte has been copied, but writes no sequences beyond 104 .Fa end . 105 If any sequences are copied, 106 .Fa to 107 is terminated with a nul byte and a pointer to that byte is returned. 108 Otherwise the original 109 .Fa to 110 is returned. 111 .Pp 112 .Fn utflen 113 returns the number of runes represented by the UTF-8 string 114 .Fa s . 115 .Pp 116 .Fn utfnlen 117 returns the number of runes represented by the first 118 .Fa len 119 bytes of the UTF-8 string 120 .Fa s . 121 If the final sequence is incomplete it will not be counted. 122 .Pp 123 .Fn utfrune 124 .Pq Fn utfrrune 125 returns a pointer to the first 126 .Pq last 127 occurrence of the rune 128 .Fa r 129 in the UTF-8 string 130 .Fa s , 131 or 132 .Dv NULL 133 if there is none. 134 The terminating nul byte is considered a part of the string 135 .Fa s . 136 .Pp 137 .Fn utfutf 138 returns a pointer to the first occurrence of the UTF-8 string 139 .Fa t 140 as a UTF-8 substring of 141 .Fa s , 142 or 143 .Dv NULL 144 if there is none. 145 If 146 .Fa t 147 is the null string, 148 .Fn utfutf 149 returns 150 .Fa s . 151 .Sh CONFORMING TO 152 These functions are compatible with those defined in the Plan 9 C library, with the exception of 153 .Fn charntorune , 154 which is an extension. 155 However, these functions are much stricter about UTF-8 validity than their Plan 9 counterparts (the kind from up there). 156 .Sh SEE ALSO 157 .Xr isalpharune 3