Site Loader

The boom of textual facts accessible electronically has made it difficult for customers to gain facts that are doubtlessly of interest to them. Users are subjected to records overload because of this variety of a situation. Approximately, in all the languages that exist in the entire world, text in any specific domain is written in full details and in this case, users are obliged to see pointless important points that they are now not fascinated in.

In such a case, even Xhosa textual content readers are also susceptible to this issue. Many domains do exist and they produce large content material of textual facts, which requires summarization to shop the time of the customers (readers).Some of the textual statistics are large two volumes of news texts and online information articles that are produced by way of the media agencies, reports from authority’s workplaces, etc.

Now, newspapers collectively with other news releases in the language reach the readers from many sources. There are a big wide variety of media corporations and presses releasing news in association that is electronic and non-digital. The shortage of automated textual content summarization offerings in the Xhosa language that can doubtlessly minimize the time readers take in shopping and reading, it can be demonstrated that readers have been spending more time than they going through the content that they are not even have pastime in. The work that is presented in this paper serves as a contribution closer to developing natural language processing functions for isiXhosa Language.

Specifically what this work does is, it will increase the scope of text summarization research discipline via exploring its usefulness for isiXhosa language. In this work, we base our focal point on novel strategies with herbal language toolkit making use of the Tokenization modules in the NLTK. Term Frequency and Sentence Position were used to assign weights to the sentences to be extracted to make a summary. An advanced stemmer for isiXhosa language for stripping phrases into their root shape was once additionally used.

RESULTS OF THE ENGLISH SYSTEM

Text ID Original Length Summary Length summary ratio
Text 1 1422
638 55.1

Text 2 1472
525
64.3

Text 3 2954
853 71.1

Text 4 1547 866 44.0

Text 5 1555
1026
34.0

Text 5 1874 814 56.6

Text 6 2044 865 57.7

Text 7 2282 829 63.7

Text 8 1656 864 47.8

Text9 2285
558 75.6

Text 10 1865
899 51.8

Text 11 2034 595 70.7

Text 12 2171 807 62.8
Text 13 2584
938 63.7

Text 14 1422
638 55.1

Text 15 1472
525
64.3

Text ID Original Length Summary Length summary ratio
Text 1 2572 855 66.7
Text 2 2160 909 57.9
Text 3 2574 855 66.7
Text 4 2166 226 89.5
Text 5 4359 1587 63.5
Text 5 2279 598 73
Text 6 3046 661 78.2
Text 7 1650 329 80.0
Text 8 2280 932 59.1
Text9 2040 706 65.3
Text 10 1862 836 55.1
Text 11 1864 232 87.5
Text 12 1549 173 88.8
Text 13 4070 424 89.5
Text 14 2915 191 93.4
Text 15 2572 855 66.7
Text 14 1422
638 55.1

Figure. 2 RELEVANCY OF OUR SYSTEM WITH MANUALLY GENRATED SUMMARIES

VIII.CONCLUSION AND FURURE WORK
This study makes use the extraction method for isiXhosa text summarization. Sentences have been extracted the according to their weight and this is done by maintaining their order. The first sentence is kept with the notion that every first sentence has sort of a significance and therefore should be given first priority.

The summarization method used is extraction based; when important sentences are extracted, it is possible that there might be a proper noun on sentence and the sentence on the other one has a problem, which it uses as reference to the pro noun.

In this scenario, if the system when constructing a summary considers the second sentence and forgets about the first one, the semantics of that whole sentence are lost .This problem is not only found in this study but it is huge problem in the field of automatic text summarization. This s part of our future work.

ACKNOWLEDGEMENT
This work is based on the research undertaken within the Telkom CoE in ICTD supported in part by Telkom SA, Tellabs, Saab Grintek Technologies, Easttel and Khula Holdings, THRIP, GRMDC and National Research Foundation of South Africa (UID: 86108). The opinions, findings and conclusions or recommendations expressed here are those of authors and none of the above sponsors accepts any liability whatsoever in this regard.

Post Author: admin