de.danielnaber.languagetool.tagging
Class BaseTagger

Object
  extended by BaseTagger
All Implemented Interfaces:
Tagger
Direct Known Subclasses:
CzechTagger, DanishTagger, DutchTagger, EnglishTagger, FrenchTagger, GalicianTagger, ItalianTagger, PolishTagger, RomanianTagger, RussianTagger, SlovakTagger, SpanishTagger, SwedishTagger, UkrainianMorfoTagger

public abstract class BaseTagger
extends Object
implements Tagger

Base tagger using Lametyzator.

Author:
Marcin Milkowski

Constructor Summary
BaseTagger()
           
 
Method Summary
 AnalyzedTokenReadings createNullToken(String token, int startPos)
          Create the AnalyzedToken used for whitespace and other non-words.
 AnalyzedToken createToken(String token, String posTag)
          Create a token specific to the language of the implementing class.
abstract  String getFileName()
          Get the filename, e.g., /resource/fr/french.dict.
 void setLocale(Locale loc)
           
 List<AnalyzedTokenReadings> tag(List<String> sentenceTokens)
          Returns a list of AnalyzedTokens that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BaseTagger

public BaseTagger()
Method Detail

getFileName

public abstract String getFileName()
Get the filename, e.g., /resource/fr/french.dict.


setLocale

public void setLocale(Locale loc)

tag

public List<AnalyzedTokenReadings> tag(List<String> sentenceTokens)
                                throws IOException
Description copied from interface: Tagger
Returns a list of AnalyzedTokens that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).

Note that this method takes exactly one sentence. Its implementation may implement special cases for the first word of a sentence, which is usually written with an uppercase letter.

Specified by:
tag in interface Tagger
Parameters:
sentenceTokens - the text as returned by a WordTokenizer but without whitespace tokens.
Throws:
IOException

createNullToken

public final AnalyzedTokenReadings createNullToken(String token,
                                                   int startPos)
Description copied from interface: Tagger
Create the AnalyzedToken used for whitespace and other non-words. Use null as the POS tag for this token.

Specified by:
createNullToken in interface Tagger

createToken

public AnalyzedToken createToken(String token,
                                 String posTag)
Description copied from interface: Tagger
Create a token specific to the language of the implementing class.

Specified by:
createToken in interface Tagger


Copyright © 2005-2009 Daniel Naber