An Analytical Study of Code Smells

: Software development process involves developing, building and enhancing high - quality software for specific tasks and as a consequence generates considerable amount of data. This data can be managed in a systematic manner creating knowledge repositories that can be used to competitive advantage. Lesson's learned as part of the development process can also be part of the knowledge bank and can be used to advantage in subsequent projects by developers and software practitioners. Code smells are a group of symptoms which reveal that code is not good enough and requires some actions to have a cleansed code. Software metrics help to detect code smells while refactoring methods are used for removing them. Furthermore, various tools are applicable for detecting of code smells. A Code smell repository organizes all the available knowledge in the literature about code smells and related concepts. An analytical study of code smells is presented in this paper which extracts useful, actionable and indicative knowledge.


INTRODUCTION
Today's software development process produces large amount of data. Lesson's learned and best practices in software development process are spread out over literature in various forms such as Code smells, design patterns, idioms etc. Organizing this knowledge into a knowledge repository, extracting insights from this data and making them available to code developers and software practitioners, can assist the software development process. Code smell is a general mechanism to distinguish structural design issues in software projects [1,2]. Code smell term was formulated by Kent Beck when helping fowler for his refactoring book and has since become an important word in software maintenance vocabulary. Existence of code smell would not interrupt the functionality of system but it would enhance the risk of decay and reduce the software quality of system over time [3,4]. Many software metrics are available in literature for detection of code smells [5,6]. Moreover, there are several tools that developers can apply for automatic or semiautomatic detection of code smells in their code. Applying appropriate refactoring actions is the right way to deal with code smells. Refactoring actions can remove Code smells and optimize the quality of software design during maintenance process [7][8][9].
This paper presents an analytical study of code smells and its related concepts. The significance of this study is to extract some insightful information from inter relation between code smells, software metrics, refactoring actions and detection tools. This paper is organized as follows: background and related work is described in section 2. The design of code smell repository is presented in section 3. Section 4 presents application of different analytical techniques to code smell related information tables and extracting of indicative information that can further enhance the usefulness of a code smell repository. This is followed by conclusion in section 5.

BACKGROUND AND RELATED WORKS
In 1999, Beck and Fowler [9] found out that code smells are some indications in source code which don't prevent of its functionality but may reveals lots of problems in future. They presented 22 code smells and some refactoring actions that can be used to develop the design.
Mantyla [10] classified 22 code smells in seven categorized because of their similar features.
Mens and Tourwé [11] presented their survey on refactoring. It includes all aspects of refactoring process such as general ideas, refactoring actions, different formalism and methods, attentions and how refactoring suits the software development process. Walter and Pietrzak [12] pointed out that certain code smells such as divergent change get added as part of the maintenance phase. They proposed that multiple pieces of code need to be analyzed to detect the change. Marinescu [13] promoted the formalization of definition of code smells. He developed the detection to a broader range of code smells and a number of design principle violations. Olbrich et al. [14] demonstrated that in the existence of bad smells, performance of open source projects is degraded. They examined this bad feature for three software projects. God Class and Brain Class were selected by them for their experimental study. They observed that without normalization of size, both smells are harmful for code. In contrast, with normalization of size, outcomes are reversed. Therefore, they evolved that the size of both code smells are major factor for measuring the harmfulness of these smells. An investigation about God Class and Data Class presented by Ferme et al. [15] proves that bad smells are destructive for source code. Different filters were suggested by them to decrease or refine detection rules for code smells. Mahmood et al. [16] investigated several refactoring tools and established their purpose of usage. Also, they examined automation of tools for different code smells. Ganea et al. [17] described that code smells make considerable disadvantages in source code. They presented a tool named "InCode" that is an Eclipse plug-in. This tool is designed for Java programs and has capability of increasing the quality of source code and decreasing the code smells. Yamashita and Moonen [18] presented an empirical study about inter relation of code smells and their effect on occurrence of maintainability issues. They found out that certain inter-smell relations were connected with issues in the maintenance process and some inter-smell relations indicated through couple artifacts. Yamashita et al. [19] had a survey for detecting a broader range of inter-smell relations. They observed that for various domains some of the code smells have same inter relation and should pay attention to them. Therefore, these inter relations can help practitioners for improving the quality of software systems.

BUILDING CODE SMELL REPOSITORY
An extensive literature survey was carried out to gather all the information about Code smells and the related concepts. An initial list of 22 code smells was proposed by Kent Beck and Martin Fowler [20] which has since grown with contributions from several researchers and practitioners into almost 65 code smells. With the increase in number of code smells, Mantyla [10] proposed a classification of code smells into six categories.
Software Metrics use measurable software attributes as indicators of latent software quality attributes [21][22][23]. Detrition of quality created by presence of code smells can be quickly detected by using one or more related software metrics [24][25][26]. The literature survey identified that around 49 software metrics are applicative in code smell detection. Software metrics are categorized in many ways and one such classification separates class level metrics from method level metrics.
Tool support is essential, as several code smells can go undetected while programming [27]. Tools are available for automatic or semi-automatic detection of code smells. The detection methods applied by tools are generally established on the calculation of a specific set of composite metrics using the threshold values for these metrics [28]. Numerous tools are accessible but 9 detection tools are popular to use by developers.

Figure 1 Code smell repository schema
Maintainability is the most important step in software development process [29]. Maintainability can be improved by use of refactoring methods. The term 'Refactoring' was presented by Opdyke [30] in his PhD thesis. Later, Fowler [9] identified that refactoring is a disciplined method for restructuring internal structures of existing source code without changing its external structures [27,31]. There are around 87 refactoring actions that could be picked from literature, which are classified into six groups.
The schema of Code smell repository presents that code smell operates as the main object of repository which is linked to software metrics, refactoring actions and detection tools. Code smell relations with its corresponding related concepts are many to many. Each of the related concepts has its own details such as name, definition, category and etc. Fig. 1 displays Code smell repository schema. Further 'links' attribute can be used to navigate to different sources of detailed information about code smells.
The code smell repository thus constructed is available at https://serene-tundra-28026.herokuapp.com  software metrics may be subjected to several analytical techniques to extract useful insights. Also, 28 code smells can be removed by 74 refactoring actions. Fig. 2 shows it.

Identifying Most Significant Set of Software Metrics
Many code smells can be detected by one or more metrics. As a sample, a formula for God Class detection is: where FEW is 5 and VERY_HIGH is 47.
In contrast, LOC alone can detect Large Class [32]. One or more metrics may detect one or more code smells. Therefore, they have a many to many relationship.
Notably, most obvious metric for code smell detection is size metric. LOC (number of lines of code) acts as the leader of metrics in detection of code smells as it is used in detection of as many as 8 code smells.   Fig. 3 gives a visual presentation of Significance of different software metrics in detection of Code smells. Appendix A shows the used metrics abbreviation.

Identifying Representative Metric for Each Code Smell Category
Code smells are organized into 6 categories [33][34][35][36]. All code smells are not categorized in literature. A decision tree classification method is applied on this table where Code smell category acts as a class label. Classification result shows that each code smell category can have one or more representative metric. Result is showed in Tab. 2. Couplers NOM This information can be used to advantage in predicting categories of code smell not yet categorized also in designing detection metrics for code smells from a particular category.

Identifying Association between Software Metrics
Code smell table with related metrics can be also subjected to identifying association between different metrics. The results of apriori algorithm with minimum support: 0.15 (3 instances) and minimum confidence: 0.9 generated 11 one itemsets, 13 two itemsets and 4 three itemsets and corresponding association rules. The three itemsets as given in Tab. 3 bring out most frequently occurring groups of related metrics.

Identifying Clustering between Code Smells and Software Metrics
The objective of clustering analysis is to recognize patterns in data and make groups based on those patterns. Thus, if two observations have similar features that mean they have the identical pattern. Consequently, they are included in the same group. Clustering is capable to shows what characteristics frequently appear together. Fig. 4 is result of clustering on detected code smells and corresponding used metrics. After cutting dendrogram at k = 4, one can find out the sets of code smells that appear together. These code smells have similarity as to the detection metrics used by them. Thus there is more chance of them occurring together.

Inter Relation between Code Smells and Refactoring Actions
Refactoring is an important task of maintenance phase that aims at improving latent software quality attributes like understandability, flexibility, and reusability [40]. One or more refactoring actions have been suggested for eliminating of one or more code smells, so the type of correlation between them is many to many. 'Move method' is an important refactoring action that addresses the problem of as many as 9 code smells. Tab. 4 shows the most important refactoring actions that can be used in getting rid of a large set of code smells. Also, Fig. 5 represents a wordcloud of Significance of different refactoring methods in detection of Code smells.

Identifying Association between Refactoring Actions
Code smell table with related refactoring actions can be also subjected to identifying association between different refactoring actions. The results of apriori algorithm with minimum support: 0.1 and minimum confidence: 0.9 generated 15 one item sets and 4 two item sets. The best four association rules generated with confidence 1 are as given below in Tab. 5. This association indicates the pairs of refactoring actions that is closely linked.

CONCLUSION
Organization of knowledge about code smells and related concepts spread out in literature into code smell repository gives rise to tables holding useful information. Applying analytical techniques to these tables can help in improving this knowledge bank further.
Analytical study of code smells and its related concepts gives insightful knowledge about code smells to improve the software development process. Results of this paper are as follows: The code smell repository and the extracted insights can assist the developers and software practitioners.

Notice
This paper was presented at IC2ST-2021 -International Conference on Convergence of Smart Technologies. This conference was organized in Pune, India by Aspire Research Foundation, January 9-10, 2021. The paper will not be published anywhere else.