Readability Assessment for Chinese L2 Sentences: An Extended Knowledge base and Comprehensive Evaluation Model-based Method

The paper assesses sentence readability based on the standards in the field of Chinese L2 teaching. In view of the inapplicability of the field standards in text readability assessment, the study focuses on two aspects. On the one hand, the graded lexicon of the HSK syllabus is extended to obtain a large-scale graded lexical knowledge base. On the other hand, sentence-based features in the existing teaching grammatical knowledge base are supplemented to achieve the automatic recognition of grammatical points and obtain quantitative grammatical indicators regarding sentence readability. Besides, based on the extended knowledge bases, comprehensive evaluation models are created to calculate the lexical and grammatical difficulties of sentences, as well as the sentence readability. The results of experiments show that the sentence readability is well differentiated in all levels of texts. Furthermore, the correlation between sentence readability and text readability is significantly improved in comparison with existing methods.


INTRODUCTION
Reading is the process of obtaining information from visual materials, i.e., using text to acquire information, understand the world, develop thinking, and obtain esthetic experience. As one of the four basic learning skills (listening, speaking, reading, and writing), reading plays an important role in Chinese L2 teaching [1]. Given the prominent position of reading in language learning, the provision of suitable reading materials for learners has become one of the key issues in Chinese L2 teaching. According to Vygotsky's zone of proximal development [2] and Krashen's comprehensible input [3], the level of reading materials provided for L2 learners should be slightly higher than the current level of the learners. However, for L2 teaching staff, it takes considerable energy and time to select reading materials with appropriate difficulty from a large number of teaching resources. Hence, an automatic assessment of text readability is urgently required.
The text representation methods and machine learning methods in natural language processing technology can complete the general assessment of text readability, but they cannot explain the details of text readability and lack interpretability which is necessary for L2 teaching. L2 teaching is different from mother-tongue teaching. L2 learners are affected by their mother tongue in terms of sentence comprehension and other aspects. Therefore, the readability features of Chinese L2 texts should consider the characteristics of L2 acquisition [4]. The relevant teaching and examination syllabuses in the Chinese L2 teaching follow the rule of L2 acquisition, which are the reference standards for L2 teaching and the important bases for the readability assessment of Chinese L2 texts. However, owing to the difficulty of automatically identifying grammatical points [5] and the limited number of words in the graded lexicon [6], these syllabuses are not applicable when applied directly to the task of readability assessments.
Presently, studies on the readability of Chinese L2 texts mainly focus on the quantitative analysis of textbooks or the readability analysis of texts in chapter units from a macroscopic perspective, and few studies analyze the sentence readability [7,8]. Meanwhile, sentences are an important part of the reading aspect of L2 teaching, and sentence readability is an important part of text readability [9]. The measurement of sentence readability can assist teachers to select difficult sentences to focus on during the teaching process or adapt and downgrade difficult sentences to suit the level of L2 learners. The assessment of sentence readability enables the text readability to be analyzed from a more microscopic and finer perspective.
In this study, we assess the readability of Chinese L2 sentences and extend the graded lexicon of the Chinese Proficiency Test (HSK) syllabus, which is a standard in the field of Chinese L2 teaching, to obtain a more practical graded lexical knowledge base for text readability assessments. Besides, we improve the existing grammatical knowledge base of Chinese L2 teaching [10] for the automatic recognition of grammatical points in sentences, and extract and quantify the relevant indicators of sentence grammatical knowledge. Text readability assessment is a function that maps the text to a value. The input variable of the function is a set of readability features, and the output variable is usually a readability level or score, which indicates a comprehensive evaluation. In other words, to establish an indicator system for the evaluated object, certain methods or models are used to analyze the collected data, and an overall quantitative judgment for the evaluated object is made [11]. Based on the extended knowledge bases, comprehensive evaluation models are built in this study to calculate the lexical and grammatical difficulties of sentences, as well as sentence readability.
The following parts of the paper are arranged as follows: In Section 2, related works and existing problems on sentence readability assessment and Chinese L2 sentence readability assessment are introduced. In Section 3, the extended methods of domain knowledge bases and the assessment method of sentence readability are introduced. The experimental results are introduced and analyzed in Section 4. Finally, the conclusion and limitations of this study are discussed in Section 5.

RELATED WORKS
Presently, most researches on text readability are based on chapters. The most representative approach is to evaluate chapter texts using readability formulas, e.g. Flesch-Kincaid Grade Level [12,13], Gunning Fog Formula [14,15], SMOG Formula [16] and other related formulas [17,18]. With recent advances in technology, researchers have applied machine learning methods to text readability assessment. For example, in [19], the authors discussed the application of decision tree, bagging decision tree, linear regression, SVM, and NB methods in text readability assessment.
Sentence is the basic language unit in language learning, and sentence readability assessment is a finegrained text readability assessment. Current research on sentence readability mainly focuses on the analysis of sentence complexity regarding structure or syntax. Meanwhile, the measurement indicators of syntactic complexity have advanced from coarse-grained indicators based on length [20] to indicators based on clause complexity [21,22] to indicators based on phrase complexity [23,24].
In the cognitive study of L2 learning, it is generally accepted that language knowledge, such as lexical knowledge or grammatical knowledge, is a significant factor that affects reading comprehension [25]. Meanwhile, lexical knowledge is considered to be the best factor for reading measurement, as the correlation between lexical knowledge and reading comprehension is 0,5 -0,85 [26]. The success of reading comprehension significantly depends on the reader's mastery of the words in the text. Thus, lexical knowledge is closely related to reading comprehension. In a study on L2 sentence readability assessment [27], the authors discussed four categories of features that affect the readability of Swedish sentences, namely, traditional features, syntactic features, lexical morphological features, and semantic features. They found that lexical morphological features (the percentage of B1 words in a sentence) had the greatest impact on the readability of L2 sentences. In addition to lexical knowledge, grammatical knowledge is also very important for text comprehension because it is crucial to the construction of discourse coherence and integration [25,28,29]. According to the Common Core State Standards, grammatical knowledge can aid reading comprehension and interpretation. When lexical information exists but lacks the necessary grammatical clues, reading comprehension becomes impossible. Because of the importance of lexical and grammatical knowledge in reading comprehension, it is feasible to predict sentence readability based on lexical and grammatical difficulties.
The study of Chinese sentence readability is still in its infancy. In [30], the author determined the sentence difficulty based on the number of lexical semantic categories in a sentence and determined that sentences with more semantic categories are relatively difficult to understand. In another study [31], the author deduced the main factors that affect sentence comprehension from a questionnaire and constructed two sentence readability formulas for primary school students and international students using linear regression. Meanwhile, based on the field standards of Chinese L2, Wanghao deduced quantitative indicators at the character and word aspects and used the CRITIC method to establish a relationship between sentence readability, character difficulty, and vocabulary difficulty [32]. Dong et al. extracted sentence difficulty features from three aspects, namely, character, word, and syntax, and used a machine learning classification algorithm to predict the sentence absolute and relative difficulties [33].
In the above studies, the application of the field standards in Chinese L2 teaching is not in-depth. Particularly, the application of lexical and grammatical knowledge in the field standards to the study of sentence readability is still inadequate, and the grading syllabus of Chinese L2 teaching is not applicable for assessing text readability. Additionally, the readability assessment of Chinese L2 texts currently uses readability formulas and traditional machine learning algorithms. Therefore, the study of sentence readability assessment needs to be improved and supplemented in several aspects.

METHOD 3.1 Extension of the Graded Lexical Knowledge Base
HSK is a standardized international Chinese proficiency test for non-native Chinese speakers (including foreigners, overseas Chinese, and Chinese ethnic minority candidates). HSK syllabus has a complete rating system, with six levels from low to high. There are 5000 words in the graded lexicon of the HSK syllabus. Besides, there are some exemplary words in the syllabus. Although these words are not listed in the graded lexicon, they are graded with additional explanations. By sorting all the words, a basic word list comprising 5650 words was obtained. When analyzing sentences or texts based on this list, there will be a large number of pseudo outline words, which will affect the assessment of text readability or sentence complexity. When learners master the words, they can learn other words by analogy. For example, when learners master "电影院 (cinema)" in the list, they can learn "电影 (movie)" by analogy. In fact, 650 additional words in the HSK syllabus were derived from the extension of the 5000 basic words. Meanwhile, there are two methods for extending the words. One method is to split them, for example, "表格 (table)" belongs to level 4, and "表" is also level 4 after splitting. The other method is to combine the words, for example, " 游 客 (tourist)" is obtained by combining "旅游 (tourism)" and "客人 (guest)" where "客 人 (guest)" belongs to level 3 and "旅游 (tourism)" belongs to level 2.The higher level is considered as the level of the combined word; hence, "游客 (tourist)" belongs to level 3.
Based on the extension method of the HSK syllabus, we adoptedsplit and combination analogies to extend the word list, and two basic principles forextending the word list are given below: (1) Follow the word formation, for example, "火车站 (railway station)" is a combination word comprising "火车 (train)" and " 站 (station)". The structural relationship between them is centering, i.e.,the former word describes or restricts the latter word. When a one-layer split is performed, only "火车 (train)" and "站 (station)" can be obtained by following the word formation, and no wrong splits, such as "火 (fire)" and "车站 (station)", is obtained. Some examples of split methods are listed in Tab. 1.

Word
Word formation The way of word meaning combination 窗户 (windows) <n mod = "n…n"><n sen = "001">窗</n><n sen = "001">户</n></n> 失指 (loss of reference) 保驾 (escort) <v mod = "v｜n"><v sen = "001">保</v><n sen = "004">驾</n></v> 泛指 (general reference) 爱人 (lover) <n mod = "v↗n"><v sen = "001">爱</v><n sen = "001">人</n></n> 特指 (special reference) 平白(gratuitously) <a mod = "a…a"><a>平</a><a sen = "004">白</a></a> 喻指 (metonymy) 人事 (human affairs) <n mod = "n↗n"><n sen = "001">人</n><n sen = "001">事</n></n> 代指 (Anaphora) The extension process of the graded lexicon is shown in Fig. 1. By improving the basic word list, split analogy, and combination analogy, an extended graded lexical knowledge base is obtained. Each step requires a sentencebased tree bank [34], Chinese word-formation knowledge base [35], and Modern Chinese Dictionary. The sentencebased tree bank and word-formation knowledge base are created by the Language and Character Resource Research Center of Beijing Normal University. The sentence-based tree bank includes Chinese mother-tongue teaching materials, international Chinese teaching materials, and literary literature. In this study, the corpus of international Chinese teaching materials in the tree bank was used. The sentence-based tree bank uses the Modern Chinese Dictionary to tag the meaning items and POS of each word. The frequency of different POS and meaning items of each word were obtained by the statistical analysis of the tagged corpus. Some POS and meaning items of words appeared more frequently in the corpus, while others appeared less frequently. Generally, the high-frequency POS and meaning terms of words need to be mastered by learners. Meanwhile, Chinese word-formation knowledge base mainly comprises POS, meaning items, structural relation, word formation, and word meaning combinations.

Improvement of the Basic Word List
The basic word list of the HSK syllabus includes the POS, level, and analogy basis of each word. To improve the basic word list, values were assigned to the attributes of each word in the list, including meaning item, word formation, and the way of word meaning combination, for the subsequent analogies. The main process of improving the basic word list is as follows: (1) Both the word formation and the way of word meaning combination are obtained from the Chinese wordformation knowledge base. First, based on the sentencebased treebank, the word frequency is supplemented for each word in the word-formation knowledge base. Meanwhile, the frequency of the meaning item of each word rather than the overall frequency of the word is counted based on the sentence-based tree bank. Owing to the particularity of the field, some meaning items do not appear in the international Chinese teaching corpus.
(2) Based on the Chinese word-formation knowledge base after supplementary information, the information of the basic word list is improved. If a word in the Chinese word-formation knowledge base has only one POS, the corresponding information of the record with the highest frequency of meaning items is considered as the attribute of the word in the basic word list. However, if the word has more than one POS in the Chinese word-formation knowledge base, the corresponding information of the most frequent meaning item with the same POS is added to the basic word list.
Some examples of the basic word list after obtaining the attributes are summarized in Tab. 3, where meaning code refers to the meaning code of words in the Modern Chinese Dictionary. For example, "爱好 (hobby)" is a noun (n) with a meaning code of 002, which means "strong interest in something". In the word formation, " 爱 好 (hobby)" is composed of two juxtaposed verbs (<n mod = "…v">), in which the meaning code of the first morpheme "爱 (love)" is 002 and that of the second morpheme "好 (like)" is 101.The transferred meaning of 0 indicates that the morpheme meaning has not changed after the word formation. "爱好 (hobby)" corresponds to two records in the Chinese word-formation knowledge base: noun and verb. In the basic word list, "爱好 (hobby)" is a noun. Therefore we added the information corresponding to the noun in the Chinese word-formation knowledge base to " 爱好 (hobby)".

Split Analogy
Split analogy is used to split words in the basic word list to obtain their morphemes and their structural relationships based on the word formation. If the meanings of the morphemes do not change when they form words, the words are split according to the structural relationship such that the level of the morphemes is the same as that of the original word. In this way, more morphemes with levels can be obtained. Three methods are used for split analogy: level-by-level split, iterative split, and only split one level. The specific process is shown in Fig. 2.
(1) Level-by-level split traverses the words of level 1 and determines whether they can be split according to the word formation and the transferred meaning. If the words can be split, only one layer is split for each word. After completion, it traverses the words of level 2 and splits them. The same process is repeated for the other levels in sequence.

Figure 2
Process of split analogy (2) After the level-by-level split is completed, the split is restarted from the first level of words and iteratively loops until there are no words that can be split.
(3) Some words are complex in the word formation and can be nested. For example, " 出 租 车 (taxi)" is the centering structure, and "出租 (rent)" is a word whose word formation is <v mod = "v→v"><v sen = "001">出 </v><v sen = "002">租</v></v>. "Only split one layer" means that "出租车 (taxi)" will only be split into "出租 (rent)" and " 车 (car)", and not into three separate morphemes "出(out)", "租(rent)" and "车 (car)". In other words, the morpheme " 出 租 (rent)" will not be split. However, during the iterative split, "出租 (rent)" will be split into two morphemes "出 (out)" and "租 (rent)". The method of splitting only one layer is used because internal morphemes may be analogized from other words. This method conforms to the top-down and gradually refined cognitive style. After a word is split, the pairs of words and meaning codes are traversed. If a pair of words and meaning code already exists in the lexical knowledge base, it will not be added; otherwise, it will be added.

Combination Analogy
Combinational analogy theoretically performs various permutations and combinations based on the extended lexical knowledge base (derived from split analogy) and analyzes the rationality and level of words. However, the permutations and combinations have a large order of magnitude, resulting in a relatively high probability of not forming a word. Moreover, some words lack word formation; hence, it is impossible to determine whether they can be combined. Therefore, combination analogy should first be used to obtain a large-scale vocabulary with word formations and transferred meanings and then analyze the level of words in the large-scale vocabulary. The large-scale vocabulary used in this study was extracted from the XML data of the sentence-based tree bank and Chinese word-formation knowledge base. It comprises 118400 vocabularies, including word, meaning code, POS, word formation, and transferred meaning.
Combination analogy was used to analyze the 118400 words. If the morphemes do not have transferred meanings when they form words, the word was split according to the word formation to obtain the morpheme pairs and the meaning codes. Afterwards, each pair of morphemes and meaning code was checked whether they exist in the extended lexical knowledge base obtained by the split analogy. If each pair of morphemes and meaning code exist, it indicates that each morpheme has a level information. The word was then added to the extended lexical knowledge base, and the level of the word is the maximum value of each morpheme level.

Improvement of the Teaching Grammatical Knowledge Base
The Language and Character Resource Center of Beijing Normal University created a grammatical knowledge base for Chinese L2 teaching based on the HSK syllabus, the Grading Syllabus, and the Practical Chinese Grammar for Foreigners. The grammatical points of the knowledge base are divided into nine categories, namely: morpheme, content word, function word, phrase, sentence component, single sentence, sentence pattern, tense and complex sentence. According to the HSK syllabus, the levels of grammatical points are marked manually.  Describe directly with POS and morphology Different grammatical points correspond to different text features. In [10], the authors added text features to grammatical points and used a regular expression pattern matching method to find the grammatical points contained in the text. However, grammatical points, such as word formation, POS, phrase type, and sentence composition involve deep syntactic structure information and cannot be recognized by regular matching from the character surface. Therefore, as a supplement to the above work, we added sentence-based features to the grammatical points based on XPath (XPath finds specific information by traversing the elements and attributes of an XML document) and attempted to automatically identify the grammatical points contained in the sentence from the corpus of the sentencebased tree bank. Some examples of sentence-based features represented by XPath are given in Tab. 4. Each grammatical point is described by word, POS, morphology, word meaning, syntax, or other relevant contents based on specific situations.

Measurement of Sentence Readability
In this paper, readability assessment is regarded as a comprehensive evaluation problem, and the entropy weight models were created to measure lexical difficulty, grammatical difficulty and sentence readability.

Lexical Difficulty
(1) The matrix WL was obtained by counting the number of words at each level in each sentence.  (1) where wl i1 to wl i7 represent the number of first-level, second-level, third-level, fourth-level, fifth-level, sixthlevel, and outlined words, respectively.
(2) The indicators in the matrix were standardized. The more the number of first-level words, the less the difficulty of the text, which belongs to a negative index. The standardization formula of wl i1 is given by Eq. (2), and the standardization formula of other indices is given by Eq. (3).
The lexical difficulty of the i-th sentence was represented using:

Grammatical Difficulty
(1) The matrix GL was obtained by counting the number of grammatical points at each level in each sentence. 7 ( ) ij m gl   GL (9) where gl i1 to gl i7 represent the number of first-level, second-level, third-level, fourth-level, fifth-level, sixthlevel, and outlined grammatical points, respectively. The other steps were the same as those of the lexical difficulty. Thus, the grammatical difficulty of the i-th sentence is given as:

Sentence Readability
The matrix SWGL was obtained by calculating the lexical and grammatical difficulties of each sentence.

( , )
i i m wc gc   SWGL (11) where i wc and i gc represent the lexical and grammatical difficulties of the i-th sentence, respectively.
The indicators in the matrix were standardized using: After standardization, the weight of each indicator (w i (i = 1, 2)) in the sentence difficulty was calculated based on the entropy weight method, and the sentence readability (sc i ) was given as: The entropy weight represents the differentiation degree of an indicator to the evaluation object and does not reflect the importance of the indicator. Hence, unreasonable weights may appear. The comprehensive weight of the indicators was obtained by combining the importance and entropy weights, and it is given by: 2 1 ( 1,2) where αi(i = 1, 2) represents the importance weight of the indicators, which will be determined through experiments.
Finally, the sentence readability (sc i ) was assessed using the comprehensive weight, and it is given by:

EXPERIMENTAL RESULTS AND ANALYSIS 4.1 Experimental data
The experimental data were from Boya Chinese textbooks in the sentence-based tree bank. The textbooks have nine volumes which are divided into three levels: primary, intermediate, and advanced. The sentences in each level of the text are listed in Tab. 5.

Extended Graded Lexical Knowledgebase
After the split and combination analogies were performed, the final extended lexical knowledge base comprised 43892 words. The number of words in all the levels is listed in Tab. 6. Based on the graded lexicon of the HSK syllabus and the extended lexical knowledge base, we performed astatistical analysis of words in the sentences of different levels of the texts, and the results are summarized in Tab. 7. The proportion of outline words in the sentences at all levels of the texts based on the extended lexical knowledge base is significantly reduced, which makes the sentence readability more accurately reflected by lexicon. Additionally, the proportion of outline words in the sentences of the primary text is the smallest, whereas the proportion of outline words in the sentences of the advanced text is the largest. This indicates that the analogy process is reasonable and consistent with cognitive characteristics. Table 7 The distribution of words in the sentences of different levels of texts To test the discrimination of the extended lexicon to the sentences of different levels of texts, the distribution difference of words in the sentences of the adjacent level text was calculated based on relative entropy. For the probability distribution of words between texts of two adjacent different levels, the relative entropy can be expressed as: 7 7 where i = 1, 2, …, 6, 7 represent the corresponding words of levels 1...6 and the outline words, p i and q i indicate the distribution of words in level i in different levels of the text, and the base number of the log is e. The results are summarized in Tab. 8. Compared with the graded lexicon of HSK syllabus, the extended graded lexicon better distinguished primary and intermediate text sentences, as well as intermediate and advanced text sentences, and it can better reflect sentence difficulty from the lexicon. The reason is that when analyzing sentences based on the extended lexical knowledge base, the proportion of pseudo outline words is significantly reduced, and each meaning item of the words is assigned a different difficulty level.
For example, when the meaning of "转机 (transfer)" is "好 转 的 机 会 (the opportunity to improve)", the level of difficulty is 6. The level of difficulty of interpreting "转机 (transfer)" as "乘飞机中途转乘其他飞机 (transfer to another plane)" is 4, which makes the text difficulty more detailed.

Lexical Difficulty
Based on the representation of lexical difficulty model, the number of vocabularies in each level in each sentence was counted. After standardization, the contribution of each indicator was calculated, and the information entropy, redundancy, and weight of each indicator were obtained. The results are summarized in Tab. 9, in which the weight becomes greater as the word level becomes higher. The weighted sum of each indicator was used to obtain the lexical difficulty of the sentence. The larger the value, the greater the lexical difficulty of the sentence. Tab. 10 compares the lexical difficulty of sentences in all levels of the text. The average lexical difficulty of the primary, intermediate, and advanced text sentences were 0,469, 1,298, and 1,516, respectively. Thus, the sentence difficulty can be well reflected by the vocabulary. Furthermore, the maximum lexical difficulty and standard deviation of the primary text sentences were 3,079 and 0,474, respectively, which indicates that the range of lexical difficulty in the primary text is limited. Meanwhile, the range of lexical difficulty in the advanced text sentence is the largest, whereas that in intermediate text sentence is in the middle.

Grammatical Difficulty
Based on the grammatical difficulty model, the information entropy, redundancy, and weight of each indicator were obtained. The results are summarized in Tab. 11, in which the weight gradually increases as the grammatical level increases, i.e., the higher the grammatical level, the greater the weight. Furthermore, the grammatical difficulty of sentences in different levels of the texts was compared. The average grammatical difficulties of the primary, intermediate, and advanced text sentences were 0,179, 0,365, and 0,341, respectively. Thus, the sentence difficulty cannot be well reflected by grammar. This is because the difficulties of a large part of grammatical points are determined by vocabularies. For example, "忽然 (suddenly)" is a time adverb in the grammatical knowledge base but it was also included in the graded lexical knowledge base. To avoid repeated calculation, the grammatical knowledge base was manually filtered and 425 grammatical points were screened out. These grammatical points were then used to distinguish the sentence difficulty, and the degree of distinction was poor. Therefore, the combination of grammar and lexicon is required in sentence readability.

Sentence Readability
Based on the assessment model of the sentence readability, the information entropy, redundancy, and entropy weights of the lexical and grammatical difficulties were obtained, and the results are summarized in Tab. 12. The entropy weight of the lexical difficulty was 0,406, while that of the grammatical difficulty was 0,594. The weight of the grammatical difficulty is relatively large because the entropy weight method is entirely based on objective data. Besides, the values of the grammatical difficulty indicators considerably vary; hence, the weight obtained by the entropy weight method is relatively large. As the entropy weight method does not consider the importance of indicators, the importance weight of indicators was introduced. After several experiments and comparisons, the importance and comprehensive weights were obtained, as shown in Tab. 12. The comprehensive weight increases the weight of the lexical difficulty and reduces the weight of the grammatical difficulty. This conclusion is consistent with the research results in [29], i.e., lexical knowledge is more strongly associated with reading comprehension than grammatical knowledge. This study also confirms previous research that the prediction ability of the grammatical indicator to the readability of Chinese L2 text is relatively weak [27]. The sentence readability can be obtained by the weighted sum of each indicator. The average, standard deviation, and confidence interval of the sentence readability in each level of the text were calculated (assuming that the distribution of sentence readability in each level of the text follows Gaussian distribution, 95% confidence interval is constructed). Meanwhile, to verify the effectiveness of the method proposed in this paper, the method proposed in [30] was used to analyze the experimental corpus in this paper. The results are summarized in Tab.13, in which both methods distinguished sentence readability in different levels of the text. The paper [30] calculated sentence readability by counting the lexical semantic categories in the sentence. The sentence readability became higher as the value became smaller. As the semantic classification dictionary in [30] could not be obtained, we counted the number of semantic categories in the sentence using HowNet. Additionally, the correlation between sentence readability and text readability was analyzed using: where sc represents the average of all the sentence readability and dc represents the average of the text readability where the sentence is located. The results are shown in Tab. 14. The correlation coefficient of the sentence readability and text readability obtained by this method is 0,37, which is significantly improved compared with the method in [30]. By constructing the T statistic to analyze the significance of the correlation, the T statistic was not within the T critical value (−2,33 < T < 2,323), indicating that at 99% confidence level, there was a significant positive correlation between the sentence readability and the text readability in which the sentence was located. Based on the practice of Chinese L2 teaching, we selected the two most important reference bases in the teaching process, namely lexical and grammatical knowledge, and calculated the lexical and grammatical difficulties in the sentences. Afterwards, the sentence readability based on lexical and grammar difficulties was obtained. The experimental results show that the method has good applicability in the field of Chinese L2 teaching.
In [30], the number of semantic categories in a sentence was used to measure the sentence difficulty. For example, although "我的爸爸是军人 (my father is a soldier)" and " 我的爸爸是书法 (my father is calligraphy)" have the same length, the number of semantic categories of the first sentence is relatively small and the sentence difficulty is relatively low. Hence, the method in [30] is more suitable for determining the relative difficulty of sentence pairs.

CONCLUSION
In this study, the standards in the field of Chinese L2 were extended and improved. First, a method for building a large-scale graded lexical knowledge base was proposed. The method extends the graded lexicon by improving the basic lexicon of the HSK syllabus, split analogy, and combination analogy, and obtains the levels of different meaning items of words. From the experimental results, the extended lexical knowledge base can better distinguish different levels of texts. Additionally, the existing teaching grammatical knowledge base was supplemented with sentence-based feature information to automatically identify grammatical points in the sentence and obtain the relevant quantitative indicators of grammatical knowledge. Based on these knowledge bases, the applicability of the comprehensive evaluation method in sentence readability measurement was explored. The experimental results show that the sentence readability calculated by the comprehensive evaluation method is well distinguished in all levels of texts, and the sentence readability and text readability are significantly positively correlated. The method proposed in this paper is aimed at the practical needs of Chinese as a second language teaching and has a good interpretability. It can not only provide sentence difficulty measurement, but also show specific lexical and grammatical difficulties in sentences.
Due to the lack of sentence readability tagging corpus, this study only analyzes the correlation between sentence readability and text level, and cannot directly evaluate the accuracy of sentence readability. To analyze the accuracy of sentence readability assessment, it is necessary to establish an L2 sentence readability tagging corpus in the future. The extended large-scale graded lexical knowledge base is suitable for text readability assessment. However, its application to Chinese L2 teaching needs to be further verified by experts.