usermanual-clusters.xml 25 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697
  1. <?xml version="1.0"?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
  4. <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
  5. <!ENTITY version SYSTEM "version.xml">
  6. ]>
  7. <chapter id="clusters">
  8. <title>Clusters</title>
  9. <section id="clusters-and-shaping">
  10. <title>Clusters and shaping</title>
  11. <para>
  12. In text shaping, a <emphasis>cluster</emphasis> is a sequence of
  13. characters that needs to be treated as a single, indivisible
  14. unit. A single letter or symbol can be a cluster of its
  15. own. Other clusters correspond to longer subsequences of the
  16. input code points &mdash; such as a ligature or conjunct form
  17. &mdash; and require the shaper to ensure that the cluster is not
  18. broken during the shaping process.
  19. </para>
  20. <para>
  21. A cluster is distinct from a <emphasis>grapheme</emphasis>,
  22. which is the smallest unit of meaning in a writing system or
  23. script.
  24. </para>
  25. <para>
  26. The definitions of the two terms are similar. However, clusters
  27. are only relevant for script shaping and glyph layout. In
  28. contrast, graphemes are a property of the underlying script, and
  29. are of interest when client programs implement orthographic
  30. or linguistic functionality.
  31. </para>
  32. <para>
  33. For example, two individual letters are often two separate
  34. graphemes. When two letters form a ligature, however, they
  35. combine into a single glyph. They are then part of the same
  36. cluster and are treated as a unit by the shaping engine &mdash;
  37. even though the two original, underlying letters remain separate
  38. graphemes.
  39. </para>
  40. <para>
  41. HarfBuzz is concerned with clusters, <emphasis>not</emphasis>
  42. with graphemes &mdash; although client programs using HarfBuzz
  43. may still care about graphemes for other reasons from time to time.
  44. </para>
  45. <para>
  46. During the shaping process, there are several shaping operations
  47. that may merge adjacent characters (for example, when two code
  48. points form a ligature or a conjunct form and are replaced by a
  49. single glyph) or split one character into several (for example,
  50. when decomposing a code point through the
  51. <literal>ccmp</literal> feature). Operations like these alter
  52. clusters; HarfBuzz tracks the changes to ensure that no clusters
  53. get lost or broken during shaping.
  54. </para>
  55. <para>
  56. HarfBuzz records cluster information independently from how
  57. shaping operations affect the individual glyphs returned in an
  58. output buffer. Consequently, a client program using HarfBuzz can
  59. utilize the cluster information to implement features such as:
  60. </para>
  61. <itemizedlist>
  62. <listitem>
  63. <para>
  64. Correctly positioning the cursor within a shaped text run,
  65. even when characters have formed ligatures, composed or
  66. decomposed, reordered, or undergone other shaping operations.
  67. </para>
  68. </listitem>
  69. <listitem>
  70. <para>
  71. Correctly highlighting a text selection that includes some,
  72. but not all, of the characters in a word.
  73. </para>
  74. </listitem>
  75. <listitem>
  76. <para>
  77. Applying text attributes (such as color or underlining) to
  78. part, but not all, of a word.
  79. </para>
  80. </listitem>
  81. <listitem>
  82. <para>
  83. Generating output document formats (such as PDF) with
  84. embedded text that can be fully extracted.
  85. </para>
  86. </listitem>
  87. <listitem>
  88. <para>
  89. Determining the mapping between input characters and output
  90. glyphs, such as which glyphs are ligatures.
  91. </para>
  92. </listitem>
  93. <listitem>
  94. <para>
  95. Performing line-breaking, justification, and other
  96. line-level or paragraph-level operations that must be done
  97. after shaping is complete, but which require examining
  98. character-level properties.
  99. </para>
  100. </listitem>
  101. </itemizedlist>
  102. </section>
  103. <section id="working-with-harfbuzz-clusters">
  104. <title>Working with HarfBuzz clusters</title>
  105. <para>
  106. When you add text to a HarfBuzz buffer, each code point must be
  107. assigned a <emphasis>cluster value</emphasis>.
  108. </para>
  109. <para>
  110. This cluster value is an arbitrary number; HarfBuzz uses it only
  111. to distinguish between clusters. Many client programs will use
  112. the index of each code point in the input text stream as the
  113. cluster value. This is for the sake of convenience; the actual
  114. value does not matter.
  115. </para>
  116. <para>
  117. Some of the shaping operations performed by HarfBuzz &mdash;
  118. such as reordering, composition, decomposition, and substitution
  119. &mdash; may alter the cluster values of some characters. The
  120. final cluster values in the buffer at the end of the shaping
  121. process will indicate to client programs which subsequences of
  122. glyphs represent a cluster and, therefore, must not be
  123. separated.
  124. </para>
  125. <para>
  126. In addition, client programs can query the final cluster values
  127. to discern other potentially important information about the
  128. glyphs in the output buffer (such as whether or not a ligature
  129. was formed).
  130. </para>
  131. <para>
  132. For example, if the initial sequence of cluster values was:
  133. </para>
  134. <programlisting>
  135. 0,1,2,3,4
  136. </programlisting>
  137. <para>
  138. and the final sequence of cluster values is:
  139. </para>
  140. <programlisting>
  141. 0,0,3,3
  142. </programlisting>
  143. <para>
  144. then there are two clusters in the output buffer: the first
  145. cluster includes the first two glyphs, and the second cluster
  146. includes the third and fourth glyphs. It is also evident that a
  147. ligature or conjunct has been formed, because there are fewer
  148. glyphs in the output buffer (four) than there were code points
  149. in the input buffer (five).
  150. </para>
  151. <para>
  152. Although client programs using HarfBuzz are free to assign
  153. initial cluster values in any manner they choose to, HarfBuzz
  154. does offer some useful guarantees if the cluster values are
  155. assigned in a monotonic (either non-decreasing or non-increasing)
  156. order.
  157. </para>
  158. <para>
  159. For buffers in the left-to-right (LTR)
  160. or top-to-bottom (TTB) text flow direction,
  161. HarfBuzz will preserve the monotonic property: client programs
  162. are guaranteed that monotonically increasing initial cluster
  163. values will be returned as monotonically increasing final
  164. cluster values.
  165. </para>
  166. <para>
  167. For buffers in the right-to-left (RTL)
  168. or bottom-to-top (BTT) text flow direction,
  169. the directionality of the buffer itself is reversed for final
  170. output as a matter of design. Therefore, HarfBuzz inverts the
  171. monotonic property: client programs are guaranteed that
  172. monotonically increasing initial cluster values will be
  173. returned as monotonically <emphasis>decreasing</emphasis> final
  174. cluster values.
  175. </para>
  176. <para>
  177. Client programs can adjust how HarfBuzz handles clusters during
  178. shaping by setting the
  179. <literal>cluster_level</literal> of the
  180. buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
  181. clustering support for this property:
  182. </para>
  183. <itemizedlist>
  184. <listitem>
  185. <para><emphasis>Level 0</emphasis> is the default and
  186. reproduces the behavior of the old HarfBuzz library.
  187. </para>
  188. <para>
  189. The distinguishing feature of level 0 behavior is that, at
  190. the beginning of processing the buffer, all code points that
  191. are categorized as <emphasis>marks</emphasis>,
  192. <emphasis>modifier symbols</emphasis>, or
  193. <emphasis>Emoji extended pictographic</emphasis> modifiers,
  194. as well as the <emphasis>Zero Width Joiner</emphasis> and
  195. <emphasis>Zero Width Non-Joiner</emphasis> code points, are
  196. assigned the cluster value of the closest preceding code
  197. point from <emphasis>different</emphasis> category.
  198. </para>
  199. <para>
  200. In essence, whenever a base character is followed by a mark
  201. character or a sequence of mark characters, those marks are
  202. reassigned to the same initial cluster value as the base
  203. character. This reassignment is referred to as
  204. "merging" the affected clusters. This behavior is based on
  205. the Grapheme Cluster Boundary specification in <ulink
  206. url="https://www.unicode.org/reports/tr29/#Regex_Definitions">Unicode
  207. Technical Report 29</ulink>.
  208. </para>
  209. <para>
  210. Client programs can specify level 0 behavior for a buffer by
  211. setting its <literal>cluster_level</literal> to
  212. <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</literal>.
  213. </para>
  214. </listitem>
  215. <listitem>
  216. <para>
  217. <emphasis>Level 1</emphasis> tweaks the old behavior
  218. slightly to produce better results. Therefore, level 1
  219. clustering is recommended for code that is not required to
  220. implement backward compatibility with the old HarfBuzz.
  221. </para>
  222. <para>
  223. Level 1 differs from level 0 by not merging the
  224. clusters of marks and other modifier code points with the
  225. preceding "base" code point's cluster. By preserving the
  226. separate cluster values of these marks and modifier code
  227. points, script shapers can perform additional operations
  228. that might lead to improved results (for example, reordering
  229. a sequence of marks).
  230. </para>
  231. <para>
  232. Client programs can specify level 1 behavior for a buffer by
  233. setting its <literal>cluster_level</literal> to
  234. <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</literal>.
  235. </para>
  236. </listitem>
  237. <listitem>
  238. <para>
  239. <emphasis>Level 2</emphasis> differs significantly in how it
  240. treats cluster values. In level 2, HarfBuzz never merges
  241. clusters.
  242. </para>
  243. <para>
  244. This difference can be seen most clearly when HarfBuzz processes
  245. ligature substitutions and glyph decompositions. In level 0
  246. and level 1, ligatures and glyph decomposition both involve
  247. merging clusters; in level 2, neither of these operations
  248. triggers a merge.
  249. </para>
  250. <para>
  251. Client programs can specify level 2 behavior for a buffer by
  252. setting its <literal>cluster_level</literal> to
  253. <literal>HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</literal>.
  254. </para>
  255. </listitem>
  256. </itemizedlist>
  257. <para>
  258. As mentioned earlier, client programs using HarfBuzz often
  259. assign initial cluster values in a buffer by reusing the indices
  260. of the code points in the input text. This gives a sequence of
  261. cluster values that is monotonically increasing (for example,
  262. 0,1,2,3,4).
  263. </para>
  264. <para>
  265. It is not <emphasis>required</emphasis> that the cluster values
  266. in a buffer be monotonically increasing. However, if the initial
  267. cluster values in a buffer are monotonic and the buffer is
  268. configured to use cluster level 0 or 1, then HarfBuzz
  269. guarantees that the final cluster values in the shaped buffer
  270. will also be monotonic. No such guarantee is made for cluster
  271. level 2.
  272. </para>
  273. <para>
  274. In levels 0 and 1, HarfBuzz implements the following conceptual
  275. model for cluster values:
  276. </para>
  277. <itemizedlist spacing="compact">
  278. <listitem>
  279. <para>
  280. If the sequence of input cluster values is monotonic, the
  281. sequence of cluster values will remain monotonic.
  282. </para>
  283. </listitem>
  284. <listitem>
  285. <para>
  286. Each cluster value represents a single cluster.
  287. </para>
  288. </listitem>
  289. <listitem>
  290. <para>
  291. Each cluster contains one or more glyphs and one or more
  292. characters.
  293. </para>
  294. </listitem>
  295. </itemizedlist>
  296. <para>
  297. In practice, this model offers several benefits. Assuming that
  298. the initial cluster values were monotonically increasing
  299. and distinct before shaping began, then, in the final output:
  300. </para>
  301. <itemizedlist spacing="compact">
  302. <listitem>
  303. <para>
  304. All adjacent glyphs having the same final cluster
  305. value belong to the same cluster.
  306. </para>
  307. </listitem>
  308. <listitem>
  309. <para>
  310. Each character belongs to the cluster that has the highest
  311. cluster value <emphasis>not larger than</emphasis> its
  312. initial cluster value.
  313. </para>
  314. </listitem>
  315. </itemizedlist>
  316. </section>
  317. <section id="a-clustering-example-for-levels-0-and-1">
  318. <title>A clustering example for levels 0 and 1</title>
  319. <para>
  320. The basic shaping operations affect clusters in a predictable
  321. manner when using level 0 or level 1:
  322. </para>
  323. <itemizedlist>
  324. <listitem>
  325. <para>
  326. When two or more clusters <emphasis>merge</emphasis>, the
  327. resulting merged cluster takes as its cluster value the
  328. <emphasis>minimum</emphasis> of the incoming cluster values.
  329. </para>
  330. </listitem>
  331. <listitem>
  332. <para>
  333. When a cluster <emphasis>decomposes</emphasis>, all of the
  334. resulting child clusters inherit as their cluster value the
  335. cluster value of the parent cluster.
  336. </para>
  337. </listitem>
  338. <listitem>
  339. <para>
  340. When a character is <emphasis>reordered</emphasis>, the
  341. reordered character and all clusters that the character
  342. moves past as part of the reordering are merged into one cluster.
  343. </para>
  344. </listitem>
  345. </itemizedlist>
  346. <para>
  347. The functionality, guarantees, and benefits of level 0 and level
  348. 1 behavior can be seen with some examples. First, let us examine
  349. what happens with cluster values when shaping involves cluster
  350. merging with ligatures and decomposition.
  351. </para>
  352. <para>
  353. Let's say we start with the following character sequence (top row) and
  354. initial cluster values (bottom row):
  355. </para>
  356. <programlisting>
  357. A,B,C,D,E
  358. 0,1,2,3,4
  359. </programlisting>
  360. <para>
  361. During shaping, HarfBuzz maps these characters to glyphs from
  362. the font. For simplicity, let us assume that each character maps
  363. to the corresponding, identical-looking glyph:
  364. </para>
  365. <programlisting>
  366. A,B,C,D,E
  367. 0,1,2,3,4
  368. </programlisting>
  369. <para>
  370. Now if, for example, <literal>B</literal> and <literal>C</literal>
  371. form a ligature, then the clusters to which they belong
  372. &quot;merge&quot;. This merged cluster takes for its cluster
  373. value the minimum of all the cluster values of the clusters that
  374. went in to the ligature. In this case, we get:
  375. </para>
  376. <programlisting>
  377. A,BC,D,E
  378. 0,1 ,3,4
  379. </programlisting>
  380. <para>
  381. because 1 is the minimum of the set {1,2}, which were the
  382. cluster values of <literal>B</literal> and
  383. <literal>C</literal>.
  384. </para>
  385. <para>
  386. Next, let us say that the <literal>BC</literal> ligature glyph
  387. decomposes into three components, and <literal>D</literal> also
  388. decomposes into two components. Whenever a cluster decomposes,
  389. its components each inherit the cluster value of their parent:
  390. </para>
  391. <programlisting>
  392. A,BC0,BC1,BC2,D0,D1,E
  393. 0,1 ,1 ,1 ,3 ,3 ,4
  394. </programlisting>
  395. <para>
  396. Next, if <literal>BC2</literal> and <literal>D0</literal> form a
  397. ligature, then their clusters (cluster values 1 and 3) merge into
  398. <literal>min(1,3) = 1</literal>:
  399. </para>
  400. <programlisting>
  401. A,BC0,BC1,BC2D0,D1,E
  402. 0,1 ,1 ,1 ,1 ,4
  403. </programlisting>
  404. <para>
  405. Note that the entirety of cluster 3 merges into cluster 1, not
  406. just the <literal>D0</literal> glyph. This reflects the fact
  407. that the cluster <emphasis>must</emphasis> be treated as an
  408. indivisible unit.
  409. </para>
  410. <para>
  411. At this point, cluster 1 means: the character sequence
  412. <literal>BCD</literal> is represented by glyphs
  413. <literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
  414. further.
  415. </para>
  416. </section>
  417. <section id="reordering-in-levels-0-and-1">
  418. <title>Reordering in levels 0 and 1</title>
  419. <para>
  420. Another common operation in some shapers is glyph
  421. reordering. In order to maintain a monotonic cluster sequence
  422. when glyph reordering takes place, HarfBuzz merges the clusters
  423. of everything in the reordering sequence.
  424. </para>
  425. <para>
  426. For example, let us again start with the character sequence (top
  427. row) and initial cluster values (bottom row):
  428. </para>
  429. <programlisting>
  430. A,B,C,D,E
  431. 0,1,2,3,4
  432. </programlisting>
  433. <para>
  434. If <literal>D</literal> is reordered to the position immediately
  435. before <literal>B</literal>, then HarfBuzz merges the
  436. <literal>B</literal>, <literal>C</literal>, and
  437. <literal>D</literal> clusters &mdash; all the clusters between
  438. the final position of the reordered glyph and its original
  439. position. This means that we get:
  440. </para>
  441. <programlisting>
  442. A,D,B,C,E
  443. 0,1,1,1,4
  444. </programlisting>
  445. <para>
  446. as the final cluster sequence.
  447. </para>
  448. <para>
  449. Merging this many clusters is not ideal, but it is the only
  450. sensible way for HarfBuzz to maintain the guarantee that the
  451. sequence of cluster values remains monotonic and to retain the
  452. true relationship between glyphs and characters.
  453. </para>
  454. </section>
  455. <section id="the-distinction-between-levels-0-and-1">
  456. <title>The distinction between levels 0 and 1</title>
  457. <para>
  458. The preceding examples demonstrate the main effects of using
  459. cluster levels 0 and 1. The only difference between the two
  460. levels is this: in level 0, at the very beginning of the shaping
  461. process, HarfBuzz merges the cluster of each base character
  462. with the clusters of all Unicode marks (combining or not) and
  463. modifiers that follow it.
  464. </para>
  465. <para>
  466. For example, let us start with the following character sequence
  467. (top row) and accompanying initial cluster values (bottom row):
  468. </para>
  469. <programlisting>
  470. A,acute,B
  471. 0,1 ,2
  472. </programlisting>
  473. <para>
  474. The <literal>acute</literal> is a Unicode mark. If HarfBuzz is
  475. using cluster level 0 on this sequence, then the
  476. <literal>A</literal> and <literal>acute</literal> clusters will
  477. merge, and the result will become:
  478. </para>
  479. <programlisting>
  480. A,acute,B
  481. 0,0 ,2
  482. </programlisting>
  483. <para>
  484. This merger is performed before any other script-shaping
  485. steps.
  486. </para>
  487. <para>
  488. This initial cluster merging is the default behavior of the
  489. Windows shaping engine, and the old HarfBuzz codebase copied
  490. that behavior to maintain compatibility. Consequently, it has
  491. remained the default behavior in the new HarfBuzz codebase.
  492. </para>
  493. <para>
  494. But this initial cluster-merging behavior makes it impossible
  495. for client programs to implement some features (such as to
  496. color diacritic marks differently from their base
  497. characters). That is why, in level 1, HarfBuzz does not perform
  498. the initial merging step.
  499. </para>
  500. <para>
  501. For client programs that rely on HarfBuzz cluster values to
  502. perform cursor positioning, level 0 is more convenient. But
  503. relying on cluster boundaries for cursor positioning is wrong: cursor
  504. positions should be determined based on Unicode grapheme
  505. boundaries, not on shaping-cluster boundaries. As such, using
  506. level 1 clustering behavior is recommended.
  507. </para>
  508. <para>
  509. One final facet of levels 0 and 1 is worth noting. HarfBuzz
  510. currently does not allow any
  511. <emphasis>multiple-substitution</emphasis> GSUB lookups to
  512. replace a glyph with zero glyphs (in other words, to delete a
  513. glyph).
  514. </para>
  515. <para>
  516. But, in some other situations, glyphs can be deleted. In
  517. those cases, if the glyph being deleted is the last glyph of its
  518. cluster, HarfBuzz makes sure to merge the deleted glyph's
  519. cluster with a neighboring cluster.
  520. </para>
  521. <para>
  522. This is done primarily to make sure that the starting cluster of the
  523. text always has the cluster index pointing to the start of the text
  524. for the run; more than one client program currently relies on this
  525. guarantee.
  526. </para>
  527. <para>
  528. Incidentally, Apple's CoreText does something different to
  529. maintain the same promise: it inserts a glyph with id 65535 at
  530. the beginning of the glyph string if the glyph corresponding to
  531. the first character in the run was deleted. HarfBuzz might do
  532. something similar in the future.
  533. </para>
  534. </section>
  535. <section id="level-2">
  536. <title>Level 2</title>
  537. <para>
  538. HarfBuzz's level 2 cluster behavior uses a significantly
  539. different model than that of level 0 and level 1.
  540. </para>
  541. <para>
  542. The level 2 behavior is easy to describe, but it may be
  543. difficult to understand in practical terms. In brief, level 2
  544. performs no merging of clusters whatsoever.
  545. </para>
  546. <para>
  547. This means that there is no initial base-and-mark merging step
  548. (as is done in level 0), and it means that reordering moves and
  549. ligature substitutions do not trigger a cluster merge.
  550. </para>
  551. <para>
  552. Only one shaping operation directly affects clusters when using
  553. level 2:
  554. </para>
  555. <itemizedlist>
  556. <listitem>
  557. <para>
  558. When a cluster <emphasis>decomposes</emphasis>, all of the
  559. resulting child clusters inherit as their cluster value the
  560. cluster value of the parent cluster.
  561. </para>
  562. </listitem>
  563. </itemizedlist>
  564. <para>
  565. When glyphs do form a ligature (or when some other feature
  566. substitutes multiple glyphs with one glyph) the cluster value
  567. of the first glyph is retained as the cluster value for the
  568. resulting ligature.
  569. </para>
  570. <para>
  571. This occurrence sounds similar to a cluster merge, but it is
  572. different. In particular, no subsequent characters &mdash;
  573. including marks and modifiers &mdash; are affected. They retain
  574. their previous cluster values.
  575. </para>
  576. <para>
  577. Level 2 cluster behavior is ultimately less complex than level 0
  578. or level 1, but there are several cases for which processing
  579. cluster values produced at level 2 may be tricky.
  580. </para>
  581. <section id="ligatures-with-combining-marks-in-level-2">
  582. <title>Ligatures with combining marks in level 2</title>
  583. <para>
  584. The first example of how HarfBuzz's level 2 cluster behavior
  585. can be tricky is when the text to be shaped includes combining
  586. marks attached to ligatures.
  587. </para>
  588. <para>
  589. Let us start with an input sequence with the following
  590. characters (top row) and initial cluster values (bottom row):
  591. </para>
  592. <programlisting>
  593. A,acute,B,breve,C,circumflex
  594. 0,1 ,2,3 ,4,5
  595. </programlisting>
  596. <para>
  597. If the sequence <literal>A,B,C</literal> forms a ligature,
  598. then these are the cluster values HarfBuzz will return under
  599. the various cluster levels:
  600. </para>
  601. <para>
  602. Level 0:
  603. </para>
  604. <programlisting>
  605. ABC,acute,breve,circumflex
  606. 0 ,0 ,0 ,0
  607. </programlisting>
  608. <para>
  609. Level 1:
  610. </para>
  611. <programlisting>
  612. ABC,acute,breve,circumflex
  613. 0 ,0 ,0 ,5
  614. </programlisting>
  615. <para>
  616. Level 2:
  617. </para>
  618. <programlisting>
  619. ABC,acute,breve,circumflex
  620. 0 ,1 ,3 ,5
  621. </programlisting>
  622. <para>
  623. Making sense of the level 2 result is the hardest for a client
  624. program, because there is nothing in the cluster values that
  625. indicates that <literal>B</literal> and <literal>C</literal>
  626. formed a ligature with <literal>A</literal>.
  627. </para>
  628. <para>
  629. In contrast, the "merged" cluster values of the mark glyphs
  630. that are seen in the level 0 and level 1 output are evidence
  631. that a ligature substitution took place.
  632. </para>
  633. </section>
  634. <section id="reordering-in-level-2">
  635. <title>Reordering in level 2</title>
  636. <para>
  637. Another example of how HarfBuzz's level 2 cluster behavior
  638. can be tricky is when glyphs reorder. Consider an input sequence
  639. with the following characters (top row) and initial cluster
  640. values (bottom row):
  641. </para>
  642. <programlisting>
  643. A,B,C,D,E
  644. 0,1,2,3,4
  645. </programlisting>
  646. <para>
  647. Now imagine <literal>D</literal> moves before
  648. <literal>B</literal> in a reordering operation. The cluster
  649. values will then be:
  650. </para>
  651. <programlisting>
  652. A,D,B,C,E
  653. 0,3,1,2,4
  654. </programlisting>
  655. <para>
  656. Next, if <literal>D</literal> forms a ligature with
  657. <literal>B</literal>, the output is:
  658. </para>
  659. <programlisting>
  660. A,DB,C,E
  661. 0,3 ,2,4
  662. </programlisting>
  663. <para>
  664. However, in a different scenario, in which the shaping rules
  665. of the script instead caused <literal>A</literal> and
  666. <literal>B</literal> to form a ligature
  667. <emphasis>before</emphasis> the <literal>D</literal> reordered, the
  668. result would be:
  669. </para>
  670. <programlisting>
  671. AB,D,C,E
  672. 0 ,3,2,4
  673. </programlisting>
  674. <para>
  675. There is no way for a client program to differentiate between
  676. these two scenarios based on the cluster values
  677. alone. Consequently, client programs that use level 2 might
  678. need to undertake additional work in order to manage cursor
  679. positioning, text attributes, or other desired features.
  680. </para>
  681. </section>
  682. <section id="other-considerations-in-level-2">
  683. <title>Other considerations in level 2</title>
  684. <para>
  685. There may be other problems encountered with ligatures under
  686. level 2, such as if the direction of the text is forced to
  687. the opposite of its natural direction (for example, Arabic text
  688. that is forced into left-to-right directionality). But,
  689. generally speaking, these other scenarios are minor corner
  690. cases that are too obscure for most client programs to need to
  691. worry about.
  692. </para>
  693. </section>
  694. </section>
  695. </chapter>