|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
ObjectSentenceTokenizer
public class SentenceTokenizer
Tokenizes text into sentences by looking for typical end-of-sentence markers, but considering exceptions (e.g. abbreviations).
| Constructor Summary | |
|---|---|
SentenceTokenizer()
Create a sentence tokenizer that uses the built-in abbreviations. |
|
SentenceTokenizer(String[] abbrevList)
Create a sentence tokenizer with the given list of abbreviations, additionally to the built-in ones. |
|
| Method Summary | |
|---|---|
static void |
main(String[] args)
|
void |
setSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)
|
List<String> |
tokenize(String s)
Tokenize the given string to sentences. |
| Methods inherited from class Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public SentenceTokenizer()
public SentenceTokenizer(String[] abbrevList)
| Method Detail |
|---|
public void setSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)
lineBreakParagraphs - if true, single lines breaks are assumed to end a paragraph,
with false, only two ore more consecutive line breaks end a paragraphpublic List<String> tokenize(String s)
tokenize in interface Tokenizerpublic static void main(String[] args)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||