de.danielnaber.languagetool.tokenizers
Class SentenceTokenizer

Object
  extended by SentenceTokenizer
All Implemented Interfaces:
Tokenizer
Direct Known Subclasses:
CzechSentenceTokenizer, DutchSentenceTokenizer, GermanSentenceTokenizer, PolishSentenceTokenizer

public class SentenceTokenizer
extends Object
implements Tokenizer

Tokenizes text into sentences by looking for typical end-of-sentence markers, but considering exceptions (e.g. abbreviations).

Author:
Daniel Naber

Constructor Summary
SentenceTokenizer()
          Create a sentence tokenizer that uses the built-in abbreviations.
SentenceTokenizer(String[] abbrevList)
          Create a sentence tokenizer with the given list of abbreviations, additionally to the built-in ones.
 
Method Summary
static void main(String[] args)
           
 void setSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)
           
 List<String> tokenize(String s)
          Tokenize the given string to sentences.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SentenceTokenizer

public SentenceTokenizer()
Create a sentence tokenizer that uses the built-in abbreviations.


SentenceTokenizer

public SentenceTokenizer(String[] abbrevList)
Create a sentence tokenizer with the given list of abbreviations, additionally to the built-in ones.

Method Detail

setSingleLineBreaksMarksParagraph

public void setSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)
Parameters:
lineBreakParagraphs - if true, single lines breaks are assumed to end a paragraph, with false, only two ore more consecutive line breaks end a paragraph

tokenize

public List<String> tokenize(String s)
Tokenize the given string to sentences.

Specified by:
tokenize in interface Tokenizer

main

public static void main(String[] args)


Copyright © 2005-2007 Daniel Naber