LanguageTool Change Log 0.9.2 (2008-02-17) -LanguageTool is now part of the "Tools" menu in OpenOffice.org and its toolbar button is docked to standard toolbar by default. -LanuageTool's OpenOffice.org configuration is now saved in the user's home directory. -Added preliminary support for Swedish (thanks to Niklas Johansson). -More false friends, especially Polish-English, and generalization of existing false friend rules for English and German. -Added possibility to display a false friend list as a dictionary formatted in HTML (rules/print-ff.xsl). -Added an attribute to the DTD so XML rules and groups of rules can be disabled by default by specifying default="off". Java rules can be set as turned off by default by using setDefaultOff() method. Both settings can be overriden in the Options dialog box. -Added possibility to set hot keys for UI items in translation. Corrected French translation (thanks to Hugo Voisard). -Fixed bug in OOo extension that caused that false friend rules did not display in the configuration dialog. -Added the possibility to disable complete rule groups (categories) in the GUI. -Fixed some Polish rules. Added more Dutch rules and several hundred more French rules. -XML rules: added possibility to refer to matched tokens' POS in elements inside . This is useful for defining agreement without combinatorial explosion. -OOo extension will be distributed in .oxt format that allows automatic updates. -Internal change: updated tagger library and streamlined some tagger code. 0.9.1 (2007-09-17) -Fixed a bug with the resource paths that was discovered in OpenOffice.org 2.3 and that made it impossible to use LanguageTool with OpenOffice.org 2.3. -XML rules: enabled referring to whole phrases in suggestions, and changing their POS tags appropriately. Added support for attribute scope="previous" in exceptions. -Fixed error marking problems for XML rules that use phrases and skipping. -Added and tweaked some Polish and English rules. -Fixed problems with badly written XML rules that used skipping. -LanguageTool now requires Java 6.0 to be compiled, but it still runs with Java 5.0. -The icon in the system tray now offers a menu when clicking the right mouse button (Java 6.0 only). -Several people had problems with the LanguageTool dialog appearing in the background (when using LanguageTool inside OpenOffice.org). A workaround has been added that tries to fix this problem. -The embedded webserver can now work with UTF-8. Input must now always be in UTF-8 (or 7-Bit ASCII, as that is a subset of UTF-8). -Fixed bug (#1726044 in sf.net tracking system) concerning the Italian tagger dictionary that resulted in a badly formatted dictionary file. -The stand-alone LanguageTool can now load an XML rule file using the new menu item "Add language...". This way people can write rules and use without touching any Java code and without the need for the rule file to be distributed with LanguageTool. Currently, the file must be named in the format "rules-xx-name.xml", where "xx" is the two-character language code and "name" is the full name of the language in English. Example: rules-en-English.xml TODO: needs more polishing: i18n, store and load configuration, add the same to OOo 0.9 (2007-05-20) -It's possible to edit rules using XML Mind XML Editor (xxe) in a WYSYWIM (what you see is what you mean) mode. Simply open the file in xxe, and it should display in a form input mode. It is recommended that you uncheck Options > Preferences > Save > Add open lines. Set a high value in Max line length. -Now you can simply open rule files (rules//grammar.xml) with Firefox to see examples of errors and descriptions of rules in a human-readable format. -A lot of new French rules by Agnes Souque (mostly grammar errors) and Hugo Voisard (mostly stylistic errors). -Everyone can now develop, modify, and test grammar rules without requiring the LanguageTool source code. Just call this command, it will run all tests in all grammar.xml files (tests are the sentences in the ... tags). If you work in Linux, type: sh testrules.sh In Windows command line, type: testrules.bat -German compound nouns are now split so that also compound words which are not in the dictionary are recognized. This way more agreement errors can be detected, e.g. "die Donaudampfschiff". The dictionary for jWordSplitter was created using the new class ExportGermanNouns. -New option -u or --list-unknown for the command line version: lists all words that are not known, i.e., words without POS information. This currently also lists words written with an uppercase first letter unless they appear at the beginning of a sentence. -Added support for variables in XML pattern rules and disambiguation rules; it is now possible to refer to previously matched tokens using element. -Added support for logical AND operation for combining tokens in error pattern and disambiguation rules (XML-based). In other words, all possible logical combinations are now available in the rule declaration. -Added support for XML-based rule disambiguation of POS tags (based on an idea by Agnes Souque). -Added support for regular expressions and POS tags to create complex suggestions (for example, suggesting another grammatical case or a word without an apostrophe). Suggestions based on selecting POS tags require a special synthesis dictionary, compiled for the language. Currently, two such dictionaries are available: for English and Polish. -Added initial support for Slovenian (Martin Srebotnjak) -Fixed and refined some faulty suggestions for Polish, English, and Spanish rules. -Added special POS tag value "UNKNOWN" that matches any token without any part of speech token. Useful for matching proper names or misspelled words that are not in the dictionary. -The tray icon should now also work on the Mac. -Added option to test suggested replacements in internal quality tests. -Refined and added more Polish and English rules with new suggestion features. -Fixed bug with saving preference for mother tongue (for false friend checking) in the stand-alone GUI and OOo extension. -Fixed bug for users that have Windows username with special characters; OpenOffice.org extension can now read its files correctly. -Internal change: changed source directory structure; now files are accessed based on the classpath and not file structure (more portable way). 0.8.9 (2007-04-07) -Security: The embedded webserver didn't escape HTML when an error message was returned which led to the possibility to inject Javascript via cross site scripting (thanks to "sumit") -A lot of new Polish rules -Improved German rules to give less false alarms -Improved English rules -Added initial support for Czech (by Jozef Licko) -Added initial support for Ukrainian (by Andriy Rysin) -Make OpenOffice.org integration work with NeoOffice (by Ray Hazlip) -Added disambiguator interface used to filter POS tags (by Jozef Licko) -Enabled use of much more compact dictionaries for languages with heavy use of infixes or prefixes (mostly Slavic languages). -Fixed a bug with suggestions for some rules that match correlated words in the sentence, and a bug with matching in some of them. -New UnpairedQuotesBracketsRule (for pairing (), [], {}, „”, »«, depending on the language); works on a paragraph-level. Additionally, new mechanisms for paragraph- and text-level rules were introduced. -New French punctuation rule for inserting a non-breakable space before '!', '?', ':', and ';' and after or before French quotation marks ('«' and '»'). -Fixed crash when started in an unsupported locale -Added support for declaring phrases (or chunks) in rules, which allows reusing same structures in various rules. This could be thought of as simplified chunking. -The command line version (LanguageTool.jar) can now be started from any directory. In previous versions, you had to be in the LanguageTool installation directory to start it. -Internal change: rules are loaded dynamically to make adding a new language easier in the future (patch by Andriy Rysin) 0.8.8 (2007-01-30) -XML output of HTTP interface changed for compatibility with An Gramadoir (http://borel.slu.edu/gramadoir/), see resource/api-output.dtd for the new format. Fixed some encoding problems with the HTTP response. -Added --api option to Main method so the XML output can be tested on the command line. -Improved English, Dutch and Polish rules -Fixed several false alarms in German agreement rule (e.g., for "Dies liegt aller Kommunikation zugrunde." and "Der Vorgang, der soziale Akteure miteinander verknüpft.") -Improved German coherency rule, e.g. using "Ketchups" and "Ketschups" in one document will lead to an error -German part-of-speech information can now be added to resource/de/added.txt -Slightly improved suggestions of German compound rule, e.g., "x mal" will suggest "x-mal", not "xmal" -Fixed false alarm in German "wieder vs. wider" rule -Command line version now stops with error message if unknown options are used 0.8.7 (2006-12-31) -Extended and improved English, Polish, and Dutch rules -Added some Lithuanian rules (Thanks to Mantas Kriaučiūnas) -Improved UppercaseSentenceStartRule for sentences that start with a quote character -Bugfix: Small fixes for German compound rule (e.g., "New Yorker" and "New-Yorker" are both okay) -Bugfix: Don't crash but show error if LanguageTool is used with Java <= 1.4 inside OpenOffice.org -Bugfix: LanguageTool stopped with a MissingResourceException when started in a locale that's not supported (e.g., fr_CA = French, Canada) -Added French translation of the GUI (Thanks to Hugo Voisard) 0.8.6 (2006-12-17) -Added support for Dutch (Thanks to Ruud J. Baars) -Added initial support for Lithuanian (no XML rules yet) -Error rules slightly improved for English, German, and Polish -Small improvements to the sentence tokenizer -Command line version and stand-alone version now print suggested fixes -New rule for German that complains about the spelling of compound words that are written as separate words, e.g., for "System Administrator" LanguageTool suggests "System-Administrator" and "Systemadministrator". See resource/de/compounds.txt for a list of words that this rule detects. -OpenOffice.org integration: -fixed that some errors messages (e.g., Language not supported) could freeze the OpenOffice.org GUI -the "server mode" configuration is now only available in a stand-alone mode, as it doesn't make sense inside OpenOffice.org -configuration dialog is now mostly translated -error marking corrected for documents that contain index markers -improved icon -Stand-alone GUI: -improved tray icon -added shortcuts for its menu items -new menu item "Check Text in Clipboard" -will now hide itself to the system tray when clicking the window close button when started in system tray mode (with -t or --tray). Use "Quit" from the menu to exit the program completely. -user interface is now mostly translated -Debugging output (-v) of German texts has been improved, it now shows the part-of-speech tags as they are used during rule matching 0.8.5 (2006-10-30) -Fixed small bugs connected with special HTML characters used internally and entities encoding -OpenOffice.org integration: documents > 64K can now be checked WARNING: There's still a problem with documents that contain tables, the wrong part of the text will be selected. -OpenOffice.org integration: progress dialog now really shows the progress -Introduced new POS tags for uncountable nouns (NN:U) and for nouns that can be used as uncountable (NN:UN). Added most common proper names to the English tagger dictionary. -LanguageTool can now run as a server that returns its results in a simple XML format via HTTP. Activate the server using "File -> Options..." or start it via command line using java -cp jaminid.jar:LanguageTool.jar de.danielnaber.languagetool.server.HTTPServer -By default, a single linebreak is not supposed to be a paragraph delimiter anymore (this was introduced in 0.8.4). With the command line application, you can use the new option -b to make LanguageTool take single linebreaks as paragraphs delimiters again. Two consecutive linebreaks are always considered a paragraph boundary (and thus a sentence boundary). -Added initial French, Spanish and Italian support -Added new rules for English and Polish 0.8.4 (2006-08-20) -Added several new rules for English, German, and Polish -The false friend rules now offer a possible correction -Stand-alone GUI: the program can now be docked to the system tray. Selecting any text and clicking on the icon in the system tray pops up the main window and checks the selected text. You need to unzip standalone-libs.zip for this feature to work (this was necessary so OpenOffice.org, which works on the same ZIP, doesn't get confused). -Stand-alone GUI: now stores its configuration to a file ".languagetool.cfg" in the user's home directory. Still buggy, as it only stores the configuration for one text language. -OpenOffice.org integration: cursor doesn't jump to start anymore before it selects the error -OpenOffice.org integration: LanguageTool's suggestions can now be accepted using a double click -A single line break in texts will now be interpreted as the end of a paragraph thus also as the end of a sentence. -Reduced size of LanguageTool by using a more compact part-of-speech tagger -Changed DTD to support more XML-like rule encoding, added categories -Extended exception matching to multiple tokens, added more flexible ways to match variable word order languages (via skipping tokens in rules) -Added lemmatization support, regular expression support -Added AnalyzedTokenReadings class to support multiple readings in all taggers, changed the code appropriately -Some API changes 0.8.3 (2006-06-25) -Requires Java 1.5 now -Support for Polish (Marcin Milkowski) -Fixed a NullPointerException in DoublePunctuationRule -Added information to the README about how to compile LanguageTool for .NET using IKVM -OpenOffice.org: The selection of the potentially incorrect text was wrong after paragraphs (on Windows only) 0.8.2 (2006-02-26) -New menu entry "Configuration..." -If you're a native speaker of German and write a text in English, LanguageTool can now warn you if you're using a "false friend", i.e., an English word that sounds like a German word but has a different meaning (e.g., become <-> bekommen). This also works vice versa (i.e., English native speakers will get a warning when writing a German text using a false friend). -Added a few new pattern rules -Missing whitespace after a comma is now detected -Added a new German rule that detects missing agreement in e.g., "meine Auto" (correct: "mein Auto"). This also introduces some new false alarms. -Errors for German text are now displayed in German -Several other small fixes -Only relevant for developers: changes to rules.dtd (affect "lang" attribute) -Only relevant for developers: XML validation doesn't happen at runtime anymore but only when "ant test" is called 0.8.1 (2005-12-22) -Made it work for cases where the add-on installation path contains spaces (often seems to be the case under Windows) -Added a few rules for English -Added more exceptions so the English "a vs. an" rule works better -Other small fixes 0.8 (2005-12-17) -Added OpenOffice.org integration. Install the LanguageTool ZIP via Tools -> Package Manager -> Add..., re-start OpenOffice.org and you'll have a new menu item "LanguageTool" which checks your text. -The German data is now more compact (but still 40MB when uncompressed) and it's included in the standard distribution -Fixed morphology data for a few German words -Added some new simple pattern rules for English and German -Switched to new English part-of-speech data and new version of part-of-speech tagger (OpenNLP-tools 1.3) -Files are called LanguageTool instead of JLanguageTool again as the Java version is the only maintained version anyway 0.7.2 (2005-11-21) -several new pattern rules for typical German and English typos -new rule that checks for ".." and ",," -new CaseRule for German text: complains if non-nouns are written in uppercase and checks uppercase spelling of "substantivierte Verben", e.g., "Das Laufen fällt mir leicht." -extension of pattern synatx: DT^foo|bar will match words tagged as "DT", except the word is "foo" or "bar" -fixed several false alarms (i.e., rules matching text which is actually correct) -fixed error marking for matches of pattern rules -fixed morphology data for these German words: Papagei, Virus, Stasi 0.7.1 (2005-09-08) -Rules can now be enabled/disabled in GUI -the error displayed to the user is now taken from the "message" element, not from the "name" attribute anymore -implemented mark_from and mark_to attributes, these can be used to expand/shrink the area of text that is marked as wrong. Both attributes default to 0 so that the text marked as wrong is exactly the one that matches the pattern. -back references like \1 can now be used in error messages to refer to the parts of the original text that is matched by the pattern -a few German noun phrases like "der große Mann" are now checked for agreement. For example, "des großer Mannes" is detected as incorrect. This requires data in rules/de/categories which needs to be downloaded separately. -a new rule checks whether a sentence starts with an uppercase letter 0.7 (2005-08-15) -first release of Java rewrite