Survey on Natural Language Generation

In this paper, we are discussing the basic concepts and fundamentals of Natural Language Generation, a field in Natural Language Engineering that deals with the conversion of non-linguistic data into natural information. We will start our investigation by introducing the NLG system and its different types. We will also pin point the major differences between NLG and NLU also known as Natural Language Understanding. Afterwards, we will shed the light on the architecture of a basic NLG system, its advantages and disadvantages. Later, we will examine the different applications of NLG, showing a case study that illustrates how an NLG system operates from an algorithmic point of view. Finally, we will review some of the existing NLG systems together with their features, taken from the real world.

INTRODUCTION NLG or Natural Language Generation is the process of constructing natural language outputs from nonlinguistic inputs. One of the central goals of NLG is to investigate how computer programs can be made to produce high-quality, expressive, uncomplicated, and natural language text from computer-internal sophisticated representations of information [1].

II.
NLG vs. NLU NLG is the inverse of NLU (Natural Language Understanding) or NLI (Natural Language Interpretation), in that NLG maps from meaning to text; while, NLU maps from text to meaning [2]. NLG is easier than NLU because a NLU system cannot control the complexity of the language structure it receives as input while NLG links the complexity of the structure of its output. Table 1 delineates the differences between NLG and NLU.

IV.
TYPES OF NLG SYSTEMS There exist different types of NLG systems starting with the simplest ones -the canned text and template filling systems, to end with sophisticated systems that adapt to realistic changes and variations in the information of a particular domain [3].

A. Canned Text
The process to generate text can be as simple as keeping a list of canned text that is copied and pasted, possibly linked or concatenated with some glue text. The results may be satisfactory in simple domains such as horoscope machines or generators of personalized business letters. Canned Text NLG type systems are easy to implement, but are unable to adapt to new situations without the intervention of a programmer [4].

B. Template Filling
In this approach, you fill a template by entering data into slots and fields, and a natural statement is generated. Junk mail is generated using template filling systems in which a mail is sent with addressee Page: 619 name in the right place. Template filling is easy to implement but not flexible enough to handle applications with any realistic variation in the information being expressed or in the context of its expression. Figure 1 depicts the architecture of a template filling type NLG system.

C. Advanced NLG Systems
As stated previously, canned text and template filling systems are not that flexible to deal with emerging situations and real word problems. Therefore, new NLG systems were investigated in order to solve complex and advanced problems. Those new NLG systems must take the following choices [5]: Content Selection: The system must choose the appropriate content to express and generate natural output based on a specific communicative goal. Lexical Selection: The system must choose the lexical items most appropriate for expressing particular concepts. Sentence Structure Aggregation: The system must generate phrases, clauses and sentence-sized chunks. Discourse Structure: The system must deal with multi-sentence discourse which has a coherent structure.

V.
NLG SYSTEM ARCHITECTURE A modern architecture for NLG systems comprises a knowledge base, a discourse planner, and a surface realizer. The discourse planner selects from a knowledge pool which information to include in the output, and creates a text structure to ensure coherence. On a more local scale, the planner process the content of each sentence and orders its parts. The surface realizer is fed by the discourse specification in order to convert sentence-sized chunks of representation into grammatically correct sentences [6]. Figure 2 shows the basic architecture of an NLG system.
It contains all information of a specific domain. It is a large general-purpose knowledge base that acts as support for domain-specific application which would help to speed up and enhance generator porting and testing on new applications.

B. Communicative Goal
It designates the intended audience who is going to use the system. The stylistic variations serve to express significant interpersonal and situational meanings (text can be formal or informal, slanted or objective, colorful or dry, etc.)

C. Discourse Planner
It selects the content from the knowledge base and then structures that content appropriately. The result is a specification for all choices made for the entire communication, potentially spanning multiple sentences and including other annotation. In other words the discourse planner takes a specified input and generates linear chunks of information. The two approaches used by discourse planners are Text Schemata and Rhetorical Relations [7].

D. Text Schemata
It is a mechanism based on expressing expressions as different high-level procedures similar to states in order to structure the output.

E. Rhetorical Relations
It is based on RTS (Rhetorical Structure Theory) which designates a central segment of text called nucleus and a more peripheral segment called the satellite. RST relations are defined in terms of the constraints they place on nucleus, on the satellite and on the combination of the nucleus and satellite [8].

F. Surface Realizer
It receives the fully specified discourse plan and generates individual sentences as contained by its lexical and grammatical resources. In other words the surface realizer converts text specifications into actual natural text. The different linguistic realizations involved in surface realization process are the following: Insert function words Choose correct inflection of content words Order words within a sentence Apply orthographic rules The two approaches used by surface realizers are Systemic Grammar and Functional Unification Grammar.

G. Systemic Grammar
It represents sentences as collections of functions and maintains rules for mapping those functions onto explicit grammatical forms. In Table 2, the one who is doing the action is the subject I and the action (verb) or the process being committed by the actor is eat and finally the object acted upon is the sandwich [9]. It is based on features grammar where the basic idea is to build the generation grammar as a feature structure with a list of all possible alternations and then unify this grammar with an input specification built using the same sort of feature structure.

VI. APPLICATIONS OF NLG SYSTEMS Database Content Display:
The description of database contents in natural language is not a new problem, and some such generators already exist for specific databases. The general solution still poses problems, however, since even for relatively simple applications it still includes unsolved issues in sentence planning and text planning.

Expert System Explanation:
This is a related problem, often however requiring more interactive ability, since the user's queries may not only elicit more information from a (static, and hence well -structured) database, but may cause the expert system to perform further reasoning as well, and hence require the dynamic explanation of system behavior, expert system rules, etc. This application also includes issues in text planning, sentence planning, and lexical choice.

Speech Generation:
Simplistic text-to-speech synthesis systems have been available commercially for a number of years, but naturalistic speech generation involves unsolved issues in discourse and interpersonal pragmatics (for example, the intonation contour of an utterance can express dislike, questioning, etc.). Today, only the most advanced speech synthesizers compute syntactic form as well as intonation contour and pitch level.

Limited Report and Letter Writing:
As mentioned in the previous section, with increasingly general representations for text structure, generator systems will increasingly be able to produce standardized multi-paragraph texts such as business letters or monthly reports. The problems faced here include text plan libraries, sentence planning, adequate lexicons, and robust sentence generators.

Automated document production:
Such as weather forecasts, simulation reports, letters etc.

Presentation of information to people in an understandable fashion:
Such as medical records, expert system reasoning etc.

VII.
CASE STUDY: WEATHER FORECAST In this case study, we will discuss the specifications of a specific NLG system for weather forecasting showing the different phases needed to transform specifications text into natural output text. Figure 3 depicts the weather forecast NLG system structure [10].

B. Phases
The Discourse Planner takes as input the language commands and generates different chunks of information, classified in a tree-like structure which is depicts in Figure 4. The Surface Realizer takes as input the leaves of the tree produced previously and generates single grammatically correct natural sentences.
The month was cooler than average. The month was drier than average. There were the average numbers of rain days. The total rain for the year so far is well below average. There was rain on every day for 8 days from 11th to 18th. Rainfall amounts were mostly small.
The Surface Realizer will process then the above sentences and produces a coherent English natural text paragraph. The month was cooler and drier than average, with the average number of rain days, but the total rain for the year so far is well below average. Although there was rain on every day for 8 days from 11th to 18th, rainfall amounts were mostly small.

VIII. EXISTING NLG SYSTEMS
In this section, we are presenting some of the existing NLG systems, taken from the real world.

A. FoG
Function: Produces textual weather reports in English and French Input: Graphical/numerical weather depiction User: Environment Canada (Canadian Weather Service) Developer: CoGenTex Status: Fielded, in operational use since 1992 Figure 5 shows the input of FoG; while, Figure 6 shows its output.

C. Loughaty
Function: Generator of natural programming instructions [11] Input: Template wizards, you fill in to generate programming instructions Usage: Learning the basic concepts of programming Figure 9 shows the input of Loughaty; while, Figure  10 shows its output.