Sentence Separation

When working with actually text, that text won’t come with clear sentence boundaries. So in addition to tokenization, you usually want to do sentence separation.

When working with standard English orthography, you can usually look at cues such as periods, question marks, exclamation points, etc. to tell you when a sentence ends.

en_str_canonical = "The fictional Pandoran biosphere, from James Cameron's Avatar, teems with a biodiversity of bioluminescent species ranging from hexapodal animals to other types of exotic fauna and flora. The Pandoran ecology forms a vast neural network spanning the entire lunar surface into which the Na'vi and other creatures can connect. The strength of this collective consciousness is illustrated when the human invaders are defeated in battle by the Pandoran ecology, after the Na'vi were nearly defeated. Cameron utilized a team of expert advisors in order to make the various examples of fauna and flora as scientifically feasible as possible."

Text from other sources (e.g. twitter or reddit) can be significantly more difficult to correctly sentence separate, depending on the conventions of the community, and you sometimes need a model specific to that source.

Stanza

# Processing English text
en_doc = stanza_en_nlp(en_str_canonical)

for sent in en_doc.sentences:
   print(' '.join([w.text for w in sent.words]))
The fictional Pandoran biosphere , from James Cameron 's Avatar , teems with a biodiversity of bioluminescent species ranging from hexapodal animals to other types of exotic fauna and flora .
The Pandoran ecology forms a vast neural network spanning the entire lunar surface into which the Na'vi and other creatures can connect .
The strength of this collective consciousness is illustrated when the human invaders are defeated in battle by the Pandoran ecology , after the Na'vi were nearly defeated .
Cameron utilized a team of expert advisors in order to make the various examples of fauna and flora as scientifically feasible as possible .

Spacy

for sent in en_doc.sentences:
   print(' '.join([w.text for w in sent.words]))
The fictional Pandoran biosphere , from James Cameron 's Avatar , teems with a biodiversity of bioluminescent species ranging from hexapodal animals to other types of exotic fauna and flora .
The Pandoran ecology forms a vast neural network spanning the entire lunar surface into which the Na'vi and other creatures can connect .
The strength of this collective consciousness is illustrated when the human invaders are defeated in battle by the Pandoran ecology , after the Na'vi were nearly defeated .
Cameron utilized a team of expert advisors in order to make the various examples of fauna and flora as scientifically feasible as possible .