NLP Library Comparison

Text summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content.

There are two general approaches to automatic summarization: extraction and abstraction.

Extraction-based summarization

Here, content is extracted from the original data, but the extracted content is not modified in any way. Examples of extracted content include key-phrases that can be used to "tag" or index a text document.

Abstraction-based summarization

Abstractive methods build an internal semantic representation of the original content, and then use this representation to create a summary that is closer to what a human might express. Abstraction may transform the extracted content by paraphrasing sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both natural language processing and often a deep understanding of the domain of the original text in cases where the original document relates to a special field of knowledge. Wikipedia

Libraries Compared: SpaCy and Sumy

Text Summarization

Extraction-based summarization

Abstraction-based summarization

SpaCy

Sumy