Next: 
Compiling-the-Word-List.html#Compiling-the-Word-ListCompiling the Word List ,
Up: 
Adding-Support-For-Other-Languages.html#Adding-Support-For-Other-LanguagesAdding Support For Other Languages 7.1 The Language Data File
The basic format of the language data file is the same as it is for the
Aspell configuration file.  It is named 
lang.dat and is
located in the architecture independent data dir for Aspell (option
data-dir) which is usually prefix/share/aspell. 
Use 
aspell config to find out where it is in your
installation.  By convention the language name should be the two letter
ISO 639 language code if it exists, if not use the three letter code.
   
The language data file has several mandatory fields, and several
optional ones.  All fields are case sensitive and should be in all
lower case.
   
The two mandatory fields are name and charset.
   
name is the name of the language and should be the same as the
file name (without the 
.dat).
   
charset is the 8-bit character set Aspell will expect the
word lists to be formatted in.  If possible choose from one of the
standard ones provided with Aspell.  These are `
iso-8859-*',
`
koi8-*', or `viscii'.  If your language does not require
any non-ascii characters choose `
iso-8859-1'.  If one of these
standard character sets is not suitable for your language than you can
create a new one.  See 
Creating-A-New-Character-Set.html#Creating-A-New-Character-SetCreating A New Character Set .
   
The optional fields are as follows:
     
data-encodingThe encoding the language data files are expected to be in as well as
the default encoding to use when saving the personal dictionaries.  It
can be either `
utf-8' or any of the 8-bit encoding that Aspell
supports.  If not set, then it defaults to 
charset.
     
specialNon-letter characters that can appear in your language such as the
`
'' and `-'. The format for the value is a list separated by
spaces.  Each item of the list has the following format.
     
          <char> <begin><middle><end>
     
     
char is the non letter character in question. 
begin, middle, end are either a
`
-' or a `*'.  A star for begin means that the
character can begin a word, a `
-' means it can't.  The same is
true for 
middle and end. For example, the entry
for the `
'' in English is:
     
          ' -*-
     
     
To include more than one middle character just list them one after
another on the same line.  For example, to make both the `
''
and the `
-' a middle character, use the following line in the
language data file:
     
          special ' -*- - -*-
     
     
soundslikeThe name of the soundslike data for the language.  The data is
expected to be in the file 
name_phonetic.dat.
     
If name is `simpile' then a very simple soundslike is
used.  This is nearly as powerful as full phonetic soundslike but it
can be computed a lot faster.  (see 
The-Simple-Soundslike.html#The-Simple-SoundslikeThe Simple Soundslike )
     
If the soundslike name is `none', or this option is not specified,
than no soundslike will be used.  The effective soundslike is the word
converted to all lowercase and possibly with accents stripped
depending on the 
store-as option.  For languages with
phonetic spelling the difference will not be very noticeable. 
However, for languages with non-phonetic spelling there will be a
noticeable difference.  The difference you notice will depend on the
quality of the soundslike data file.  If you do not notice much of a
difference for a language with non-phonetic spelling that is a good
indication that the soundslike data is not rough enough—or the words
you are trying are not that badly misspelled.
     
invisible-soundslikeAvoid storing the soundslike information with the word.  Instead it is
computed as needed.  This option defaults to true if the soundslike is
`
none' or `simpile', and false when a phonetic soundslike is
used.
     
repl-tableSee 
Replacement-Tables.html#Replacement-TablesReplacement Tables .
     
keyboardThe base name of the keyboard definition file to use.  For more
information see 
Notes-on-Typo_002dAnalysis.html#Notes-on-Typo_002dAnalysisNotes on Typo-Analysis .
     
sug-split-charA list of characters which specifies which characters to insert between
two words when a word is split.  This is a list option.
     
affixaffix-compresspartially-expandSee 
Affix-Compression.html#Affix-CompressionAffix Compression .
     
store-asHow the words are indexed in the dictionary.  If "stripped" then the
word is indexed in a lower case and de-accented form.  If "lower", then
the word is indexed in a lower case form but with accent info still
intact.  This just controls how the word is indexed, not how it is
stored.  The default is "stripped" unless affix compression is used.
     
norm-requiredShould be set to true if your language makes use of private use
characters or when Normalization Form C is not the same as
full composition.
     
normalizenorm-form   
Additional options includes options to control how run-together words
are handled the same way as they are in the normal configuration
files.  for more information, please 
Controlling-the-Behavior-of-Run_002dtogether-Words.html#Controlling-the-Behavior-of-Run_002dtogether-WordsControlling the Behavior of Run-together Words .
   
