Skip to content

Grapheme clusters fail to represent syllabic conjuncts in Tamil #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
r12a opened this issue Feb 4, 2020 · 3 comments
Open

Grapheme clusters fail to represent syllabic conjuncts in Tamil #72

r12a opened this issue Feb 4, 2020 · 3 comments
Labels
doc:taml gap i:segmentation Grapheme/word segmentation & selection l:ta Tamil language & script p:basic s:taml Tamil script x:taml

Comments

@r12a
Copy link
Contributor

r12a commented Feb 4, 2020

The Unicode concept of 'grapheme cluster' currently fails to represent the small number of conjuncts that are used in modern Tamil, ie. kṣa க்ஷ and the two alternative sequences for srī, ஶ்ரீ and ஸ்ரீ. This means that various editing operations, line breaking algorithms, vertical text, etc. are liable to break text at the wrong point when those conjuncts are used. For more details, see the relevant sections.

Indic Layout Requirements provides a grammar for indian orthographic syllable boundaries which works for the consonant clusters in Tamil which don't use conjuncts.

Specs:
CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that these cases are beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support.

@r12a r12a added i:segmentation Grapheme/word segmentation & selection gap p:basic doc:taml labels Feb 4, 2020
@r12a
Copy link
Contributor Author

r12a commented Feb 4, 2020

The first comment in this issue contains text that will automatically appear in the Tamil gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

@miloush
Copy link

miloush commented Feb 8, 2020

Does this refer to ஸ்ரீ and க்ஷ only?

@r12a
Copy link
Contributor Author

r12a commented Feb 10, 2020

Well, those two and ஶ்ரீ also (the Unicode recommended sequence for shri). The text in this section was very old and i made some badly needed edits. (See the 'edited' pulldown for the initial comment.)

@r12a r12a changed the title Grapheme clusters fail to represent syllabic conjuncts Grapheme clusters fail to represent syllabic conjuncts in Tamil May 18, 2021
@r12a r12a added the x:taml label May 18, 2021
@r12a r12a added the l:ta Tamil language & script label May 1, 2024
@r12a r12a moved this to Issue identified, needing investigation in Gap-analysis pipeline Jun 20, 2024
@r12a r12a added the s:taml Tamil script label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc:taml gap i:segmentation Grapheme/word segmentation & selection l:ta Tamil language & script p:basic s:taml Tamil script x:taml
Projects
Status: Issue identified, needing investigation
Development

No branches or pull requests

2 participants