mb_strlen

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)

mb_strlen — 文字列の長さを得る

説明

mb_strlen(string $string, ?string $encoding = null): int

文字列の長さを取得します。

パラメータ

string: 長さを調べたい文字列。
encoding: encoding パラメータには文字エンコーディングを指定します。省略した場合、もしくは null の場合は、内部文字エンコーディングを使用します。

戻り値

文字エンコーディング encoding の文字列 string の文字数を返します。マルチバイト文字の一文字は1個として数えられます。

エラー / 例外

不明な encoding を指定した場合は E_WARNING レベルの警告が生成されます。

変更履歴

バージョン	説明
8.0.0	`encoding` は、nullable になりました。

参考

mb_internal_encoding() - 内部文字エンコーディングを設定あるいは取得する
grapheme_strlen() - 書記素単位で文字列の長さを取得する
iconv_strlen() - 文字列の文字数を返す
strlen() - 文字列の長さを得る

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 5 notes

down

Yzmir Ramirez ¶

14 years ago

If you are unsure about what $encoding can be set to, here's a full list of all the encodings supported by this extension:

http://www.php.net/manual/en/mbstring.supported-encodings.php

down

drake127 ¶

18 years ago

Speed of mb_strlen varies a lot according to specified character set.

If you need length of string in bytes (strlen cannot be trusted anymore because of mbstring.func_overload) you should use <?php mb_strlen($string, '8bit'); ?>.
It's the fastest way (still a way slower than strlen, though) to determine byte length of string. Other single byte character sets (ASCII, ISO-8859-1, ...) are several times slower than 8bit.

down

koala at example dot com ¶

18 years ago

Just did a little benchmarking (1.000.000 times with lorem ipsum text) on the mbs functions

especially mb_strtolower and mb_strtoupper are really slow (up to 100 times slower compared to normal functions). Other functions are alike-ish, but sometimes up to 5 times slower.

just be cautious when using mb_ functions in high frequented scripts.

# test runs: 1000000
# benchmarking strlen vs. mb_strlen
# normal strlen: 3.6795361042023 ms, average: 3.6795361042023E-6 ms
# mb_strlen: 5.5934538841248 ms, average: 5.5934538841248E-6 ms
ok 1 - mb_strlen is slower than strlen
# mb_strlen is 1.52 slower than strlen
#
#
# benchmarking strpos vs. mb_strpos
# normal strpos: 5.5523281097412 ms, average: 5.5523281097412E-6 ms
# mb_strlen: 31.180974960327 ms, average: 3.1180974960327E-5 ms
ok 2 - mb_strlen is slower than strlen
# mb_strpos is 5.62 slower than strpos
#
#
# benchmarking substr vs. mb_substr
# normal substr: 3.4437320232391 ms, average: 3.4437320232391E-6 ms
# mb_strlen: 3.5374181270599 ms, average: 3.5374181270599E-6 ms
ok 3 - mb_strlen is slower than strlen
# mb_substr is 1.03 slower than substr
#
#
# benchmarking strtolower vs. mb_strtolower
# normal strtolower: 4.446839094162 ms, average: 4.446839094162E-6 ms
# mb_strlen: 193.44901108742 ms, average: 0.00019344901108742 ms
ok 4 - mb_strlen is slower than strlen
# mb_strtolower is 43.5 slower than strtolower
#
#
# benchmarking strtoupper vs. mb_strtoupper
# normal strtoupper: 3.0210740566254 ms, average: 3.0210740566254E-6 ms
# mb_strlen: 340.71775603294 ms, average: 0.00034071775603294 ms
ok 5 - mb_strlen is slower than strlen
# mb_strtoupper is 112.78 slower than strtoupper

down

Ben ¶

17 years ago

If you find yourself without the mb string functions and can't easily change it, a quick hack replacement for mb_strlen for utf8 characters is to use a a PCRE regex with utf8 turned on.

$strlen = preg_match_all("/.{1}/us",$utf8string,$dummy);

This is basically an ugly hack which counts all single character matches, and I'd expect it to be painfully slow on large strings.

down

-3

David Spector ¶

5 years ago

It may not be clear whether PHP actually supports utf-8, which is the current de facto standard character encoding for Web documents, which supports most human languages. The good news is: it does.

I wrote a test program which successfully reads in a utf-8 file (without BOM) and manipulates the characters using mb_substr, mb_strlen, and mb_strpos (mb_substr should normally be avoided, as it must always start its search at character position 0).

The results with a variety of Unicode test characters in utf-8 encoding, up to four bytes in length, were mostly correct, except that accent marks were always mistakenly treated as separate characters instead of being combined with the previous character; this problem can be worked around by programming, when necessary.

＋add a note