libutf

UTF-8 library
git clone git://git.suckless.org/libutf
Log | Files | Refs | README | LICENSE

rune.3 (3360B)


      1 .Dd $Mdocdate$
      2 .Dt RUNE 3
      3 .Os
      4 .Sh NAME
      5 .Nm runetochar, chartorune, charntorune, runelen, runenlen, fullrune, utfecpy, utflen, utfnlen, utfrune, utfrrune, utfutf
      6 .Nd UTF-8 rune conversion
      7 .Sh SYNOPSIS
      8 .In utf.h
      9 .Ft int
     10 .Fn runetochar "char *s" "Rune *p"
     11 .Ft int
     12 .Fn chartorune "Rune *p" "char *s"
     13 .Ft int
     14 .Fn charntorune "Rune *p" "char *s" "size_t len"
     15 .Ft int
     16 .Fn runelen "Rune r"
     17 .Ft int
     18 .Fn runenlen "Rune *p" "size_t len"
     19 .Ft int
     20 .Fn fullrune "char *s" "size_t len"
     21 .Ft char *
     22 .Fn utfecpy "char *to" "char *end" "char *from"
     23 .Ft size_t
     24 .Fn utflen "char *s"
     25 .Ft size_t
     26 .Fn utfnlen "char *s" "size_t len"
     27 .Ft char *
     28 .Fn utfrune "char *s" "Rune r"
     29 .Ft char *
     30 .Fn utfrrune "char *s" "Rune r"
     31 .Ft char *
     32 .Fn utfutf "char *s" "char *t"
     33 .Sh DESCRIPTION
     34 The following functions convert to and from a UTF-8 byte stream and Unicode runes.
     35 .Pp
     36 .Fn runetochar
     37 converts one rune at
     38 .Fa p
     39 to at most
     40 .Dv UTFmax
     41 bytes starting at
     42 .Fa s ,
     43 and returns the number of bytes copied.
     44 .Dv UTFmax
     45 is the maximum number of bytes required to represent a rune.
     46 If the rune is illegal,
     47 .Fn runetochar
     48 will return 0.
     49 .Pp
     50 .Fn chartorune
     51 converts at most
     52 .Dv UTFmax
     53 bytes starting at
     54 .Fa s
     55 to one rune at
     56 .Fa p ,
     57 and returns the number of bytes copied.
     58 If the input is invalid UTF-8,
     59 .Fn chartorune
     60 will convert the sequence to
     61 .Dv Runeerror
     62 (0xFFFD) and return the number of bytes in the invalid sequence.
     63 .Pp
     64 .Fn charntorune
     65 converts at most
     66 .Fa len
     67 bytes starting at
     68 .Fa s
     69 to one rune at
     70 .Fa p ,
     71 and returns the number of bytes copied.
     72 If the next sequence is longer than
     73 .Fa len
     74 bytes,
     75 .Fn charntorune
     76 will return 0.
     77 .Pp
     78 .Fn runelen
     79 returns the number of bytes required to convert the rune
     80 .Fa r
     81 into UTF-8.
     82 If the rune is illegal,
     83 .Fn runelen
     84 will return 0.
     85 .Pp
     86 .Fn runenlen
     87 returns the number of bytes required to convert the
     88 .Fa len
     89 runes pointed to by
     90 .Fa p
     91 into UTF-8.
     92 .Pp
     93 .Fn fullrune
     94 returns 1 if the first
     95 .Fa len
     96 bytes of the UTF-8 string
     97 .Fa s
     98 form a complete rune, and 0 otherwise.
     99 .Pp
    100 The following functions are analogous to the corresponding string routines, with `utf' substituted for `str', and `rune' for `chr'.
    101 .Pp
    102 .Fn utfecpy
    103 copies UTF-8 sequences until a nul byte has been copied, but writes no sequences beyond
    104 .Fa end .
    105 If any sequences are copied,
    106 .Fa to
    107 is terminated with a nul byte and a pointer to that byte is returned.
    108 Otherwise the original
    109 .Fa to
    110 is returned.
    111 .Pp
    112 .Fn utflen
    113 returns the number of runes represented by the UTF-8 string
    114 .Fa s .
    115 .Pp
    116 .Fn utfnlen
    117 returns the number of runes represented by the first
    118 .Fa len
    119 bytes of the UTF-8 string
    120 .Fa s .
    121 If the final sequence is incomplete it will not be counted.
    122 .Pp
    123 .Fn utfrune
    124 .Pq Fn utfrrune
    125 returns a pointer to the first
    126 .Pq last
    127 occurrence of the rune
    128 .Fa r
    129 in the UTF-8 string
    130 .Fa s ,
    131 or
    132 .Dv NULL
    133 if there is none.
    134 The terminating nul byte is considered a part of the string
    135 .Fa s .
    136 .Pp
    137 .Fn utfutf
    138 returns a pointer to the first occurrence of the UTF-8 string
    139 .Fa t
    140 as a UTF-8 substring of
    141 .Fa s ,
    142 or
    143 .Dv NULL
    144 if there is none.
    145 If
    146 .Fa t
    147 is the null string,
    148 .Fn utfutf
    149 returns
    150 .Fa s .
    151 .Sh CONFORMING TO
    152 These functions are compatible with those defined in the Plan 9 C library, with the exception of
    153 .Fn charntorune ,
    154 which is an extension.
    155 However, these functions are much stricter about UTF-8 validity than their Plan 9 counterparts (the kind from up there).
    156 .Sh SEE ALSO
    157 .Xr isalpharune 3