Talaffuz leksikasining spetsifikatsiyasi - Pronunciation Lexicon Specification

The Talaffuz leksikasining spetsifikatsiyasi (PLS) bu a W3C Ikkalasi uchun ham talaffuz ma'lumotlarini bir-biriga mos keluvchi spetsifikatsiyasini ta'minlash uchun ishlab chiqilgan tavsiya nutqni aniqlash va nutq sintezi ovozli ko'rib chiqish dasturlari ichidagi dvigatellar. Til xalqaro miqyosda foydalanish uchun talaffuz ma'lumotlarining aniq spetsifikatsiyasini qo'llab-quvvatlagan holda, ishlab chiquvchilar tomonidan ishlatilishi oson.

Til so'z yoki iborani bir yoki bir nechta talaffuzni standart talaffuz alifbosi yordamida yoki agar kerak bo'lsa, sotuvchiga xos alifbolar yordamida ko'rsatishga imkon beradi. Talaffuzlar PLS hujjatiga birlashtirilgan bo'lib, unga boshqa markalash tillarida havola qilinishi mumkin, masalan, Nutqni tanib olish grammatikasining spetsifikatsiyasi. SRGS va nutq sintezini belgilash tili SSML.

Foydalanish

PLS hujjatining namunasi:

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"      xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"     xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"      xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "     alifbo ="ipa" xml: lang ="en-US">   <lexeme>     <grapheme>hukm</grapheme>     <grapheme>hukm</grapheme>     <phoneme>ˈDʒʌdʒ.mənt</phoneme>            "ˈDʒʌdʒ.mənt" ->    </lexeme>   <lexeme>     <grapheme>kuyov</grapheme>     <grapheme>kelin</grapheme>     <phoneme>fiˈɒns.eɪ</phoneme>            "fiˈɒns.eɪ" ->      <phoneme>ˌFiː.ɑːnˈseɪ</phoneme>            "ˌFiː.ɑːnˈseɪ" ->    </lexeme> </lexicon>

yaxshilash uchun ishlatilishi mumkin TTS quyidagi ko'rsatilgandek SSML 1.0 hujjat:

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"      xmlns ="http://www.w3.org/2001/10/synthesis"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"     xsi: schemaLocation ="http://www.w3.org/2001/10/sentez       http://www.w3.org/TR/speech-synthesis/synthesis.xsd "     xml: lang ="en-US">    uri ="http://www.example.org/lexicon_defined_above.xml"/>   <p> Kuyovimning fikriga ko'ra, Las-Vegas - asal oyi uchun eng yaxshi joy. Men Venetsiyani afzal ko'raman va Venetsiyalik kazino maqbul kelishuv deb o'ylamayman, deb javob berdim.</p> </speak>

balki yaxshilash uchun ham ASR quyidagi SRGS 1.0 grammatika:

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"     xmlns ="http://www.w3.org/2001/06/grammar"     xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"      xsi: schemaLocation ="http://www.w3.org/2001/06/grammar        http://www.w3.org/TR/speech-grammar/grammar.xsd "     xml: lang ="en-US" root ="filmlar" rejim ="ovoz">    uri ="http://www.example.org/lexicon_defined_above.xml"/>    id ="filmlar" qamrov ="ommaviy">     <one-of>             <item>Terminator 2: Qiyomat kuni</item>              <item>Mening katta semiz obxox kuyovim</item>              <item>Plutonning qiyomat kuni</item>     </one-of>    </rule> </grammar>

Umumiy foydalanish holatlari

Xuddi shu imlo uchun bir nechta talaffuz

Uchun ASR Til ichida turli xil talaffuzlar bilan kurashish uchun bitta so'z yoki so'z birikmasining bir necha marta takrorlanishiga tayanish odatiy holdir. Pronunciation Lexicon tilida bir nechta talaffuzlar bir xil elementi ichida bir nechta (yoki ) elementi bilan ifodalanadi.

Quyidagi misolda "Nyuton" so'zida ikkita talaffuz mavjud.

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"       xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"       xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "      alifbo ="ipa" xml: lang ="uz-GB">   <lexeme>     <grapheme>Nyuton</grapheme>     <phoneme>ːNjuːten</phoneme>     <!-- IPA string is: "ˈnjuːtən" -->     <phoneme>Nuːten</phoneme>     <!-- IPA string is: "ˈnuːtən" -->   </lexeme> </lexicon>

Bir nechta orfografiya

Ba'zi hollarda bir xil so'z yoki ibora uchun muqobil matnli tasvirlar mavjud. Bu bir qator sabablarga ko'ra paydo bo'lishi mumkin. Tafsilotlar uchun PLS ning 4.5-bo'limiga qarang. Bular bir xil ma'noga ega bo'lgan tasvirlar (gomofonlardan farqli o'laroq) bo'lgani uchun, ularni bir nechta grafemalarni o'z ichiga olgan bitta elementi yordamida namoyish qilish tavsiya etiladi.

Bu erda bir nechta orfografiyaning ikkita oddiy misoli keltirilgan: inglizcha so'zning muqobil yozilishi va yaponcha so'zning bir nechta yozuvlari.

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"       xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"       xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "      alifbo ="ipa" xml: lang ="en-US">   <!-- English entry showing how alternative spellings are handled -->   <lexeme>     <grapheme>rang</grapheme>     <grapheme>rang</grapheme>     <phoneme>Ʌkʌlar</phoneme>     <!-- IPA string is: "ˈkʌlər" -->   </lexeme> </lexicon> <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"       xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"       xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "      alifbo ="ipa" xml: lang ="jp">             romaji, kanji va hiragana orfografiyalari ->   <lexeme>     <grapheme>nihongo</grapheme>     <grapheme>Katalog</grapheme>     <grapheme>に ほ ん ご</grapheme>     <phoneme>ŋɡihoŋɡo</phoneme>     <!-- IPA string is: "ɲihoŋɡo" -->   </lexeme> </lexicon>

Gomofonlar

Ko'pgina tillarda mavjud gomofonlar, bir xil talaffuzga ega, ammo har xil ma'noga ega so'zlar (va, ehtimol, har xil imlolar), masalan "urug '" va "cede". Bularni turli xil leksemalar sifatida ifodalash tavsiya etiladi.

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"       xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"       xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "      alifbo ="ipa" xml: lang ="en-US">   <lexeme>     <grapheme>berish</grapheme>     <phoneme>siːd</phoneme>     <!-- IPA string is: "siːd" -->   </lexeme>   <lexeme>     <grapheme>urug '</grapheme>     <phoneme>siːd</phoneme>     <!-- IPA string is: "siːd" -->   </lexeme> </lexicon>

Homograflar

Aksariyat tillarda turli xil ma'noga ega, ammo imlosi bir xil (va ba'zan talaffuzi har xil) so'zlar mavjud homograflar. Masalan, ingliz tilida bass (baliq) so'zi va bass so'zi (musiqada) bir xil imlosiga ega, ammo ma'nolari va talaffuzlari har xil. Ushbu so'zlarni rol atributining turli xil qiymatlari bilan ajralib turadigan alohida elementlari yordamida ifodalash tavsiya etilsa ham (PLS 1.0 ning 4.4-bo'limiga qarang), agar talaffuz leksikasi muallifi ikkita so'zni ajratishni istamasa shunchaki bir xil elementi ichida muqobil talaffuz sifatida ifodalanadi. Ikkinchi holatda TTS protsessor birinchi yoki ikkinchi transkripsiyani qachon qo'llashni ajrata olmaydi.

Ushbu misolda "bosh" homografining talaffuzlari ko'rsatilgan.

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"       xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"       xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "      alifbo ="ipa" xml: lang ="en-US">   <lexeme>     <grapheme>bosh</grapheme>     <phoneme>bæs</phoneme>     <!-- IPA string is: bæs -->     <phoneme>bor</phoneme>     <!-- IPA string is: beɪs -->   </lexeme> </lexicon>

E'tibor bering, ingliz tilida fe'l-atvor juftlarining ko'plab misollari mavjud bo'lib, ular kabi muomala qilish mumkin homograflar yoki muallifning xohishiga qarab muqobil talaffuz sifatida. Ikki misol - "rad etish" ot / fe'l va "manzil" ism / fe'l.

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"       xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"       xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "      xmlns: mypos ="http://www.example.org/my_pos_namespace"      alifbo ="ipa" xml: lang ="en-US">    roli ="mypos: fe'l">     <grapheme>rad etish</grapheme>     <phoneme>rɪˈfjuːz</phoneme>     <!-- IPA string is: "rɪˈfjuːz" -->   </lexeme>    roli ="mypos: ism">     <grapheme>rad etish</grapheme>     <phoneme>Rɛfjuːs</phoneme>     <!-- IPA string is: "ˈrɛfjuːs" -->   </lexeme> </lexicon>

Imlo bilan talaffuz qilish

Ba'zi so'zlar va iboralar uchun talaffuz boshqalarning ketma-ketligi sifatida tez va qulay tarzda ifodalanishi mumkin imlolar. Ishlab chiquvchidan lingvistik bilimga ega bo'lish talab qilinmaydi, aksincha talaffuzlar allaqachon mavjud bo'lishi kutilgan. Boshqa orfografiyalar yordamida talaffuzlarni ifodalash uchun elementidan foydalanish mumkin.

Qisqartma kengayishi bilan shug'ullanish uchun bu xususiyat juda foydali bo'lishi mumkin.

 <?xml version="1.0" encoding="UTF-8"?>  versiya ="1.0"       xmlns ="http://www.w3.org/2005/01/pronunciation-lexicon"      xmlns: xsi ="http://www.w3.org/2001/XMLSchema-instance"       xsi: schemaLocation ="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd "      alifbo ="ipa" xml: lang ="en-US">   <!--      Qisqartma kengayishi   -->   <lexeme>     <grapheme>W3C</grapheme>     <alias>Butunjahon Internet tarmog'idagi konsortsium</alias>   </lexeme>   <!--      raqamni ko'rsatish   -->   <lexeme>     <grapheme>101</grapheme>     <alias>yuz bir</alias>   </lexeme>   <!--      qo'pol talaffuz mexanizmi   -->   <lexeme>     <grapheme>Tailand</grapheme>     <alias>erni bog'lash</alias>   </lexeme>   <!--      qo'pol talaffuz mexanizmi va qisqartirish kengayishi   -->   <lexeme>     <grapheme>BBC 1</grapheme>     <alias>dengiz bo'l</alias>   </lexeme> </lexicon>

Holati va kelajagi

  • PLS 1.0 W3C tavsiyasi maqomiga 2008 yil 14 oktyabrda erishdi.

Shuningdek qarang

Adabiyotlar

Tashqi havolalar