Convexity deficit of benzenoids

In 2012, a family of benzenoids was introduced by Cruz, Gutman, and Rada, which they called convex benzenoids. In this paper we introduce the convexity deficit, a new topological index intended for benzenoids and, more generally, fusenes. This index measures by how much a given fusene departs from convexity. It is defined in terms of the boundary-edges code. In particular, convex benzenoids are exactly the benzenoids having convexity deficit equal to 0. Quasi-convex benzenoids form the family of non-convex benzenoids that are closest to convex, i.e., they have convexity deficit equal to 1. Finally, we investigate convexity deficit of several important families of benzenoids.


Introduction
Benzenoids form an important family of graphs and molecules. Polycyclic (aromatic) hydrocarbons [9,10,19,20], of which the benzenoids form a subset, are important molecular systems with a rich organic chemistry [23,39] characterised by specific reactivity [34,38], spectra [35], and photophysics. They occur naturally, geologically and as by-products of natural and anthropogenic combustion processes, with considerable implications for the environment [1] and human health [27] and have been postulated as significant contributors to the carbon inventory in the wider Universe [2]. There has also been a huge amount of interest over several decades in the graph theory of benzenoids and related structures and its application to prediction of physical and chemical properties (see, e.g., the textbooks [12,13,16,26,30,40]). Much of the mathematical chemistry literature is concerned with prediction or rationalisation of electronic structure, but there is also interest in classification of the shapes available to benzenoids. As pointed out before [18], molecular shape is intimately associated with molecular electric and steric properties, such as quadrupole moment or van der Waals envelope, which are implicated in structure-activity relationships from odour perception [37] to carcinogenicity [3,27,32,33]. Codes based on boundaries seem especially suitable for systematising our notions of shapes of benzenoids. The reader is referred to the books [15,30] for definitions and basic facts.

Preliminaries
We begin by giving a mathematical definition of a fusene [7,8]. The class of fusenes contains as a proper subclass the class of benzenoids. Benzenoids can be now defined in terms of fusenes.
Definition 2.2. A fusene that can be embedded in the infinite hexagonal lattice is called a benzenoid.
In other words, benzenoids are those fusenes which are also subgraphs of the infinite hexagonal lattice.
Example 2.1. Figure 1 shows four subcubic plane graphs. Pentalene is not a fusene, because its bounded faces are pentagons. Biphenyl is not a fusene because it is not 2-connected. Anthracene and [6]helicene are both fusenes. Anthracene is also a benzenoid, whilst [6]helicene is not.  Let us denote the class of all benzenoids by B and the class of all fusenses by F. The inner dual of a plane graph is its dual graph with the vertex that corresponds to the outer face removed. A catacondensed fusene is a fusene whose inner dual is a tree. Fusenes that are not catacondensed are called pericondensed. We will denote the class of all catacondensed fusenes by F * . Catacondensed fusenes can be further divided into branched and non-branched fusenes. A catacondensed fusene is called non-branched if its inner dual is a path; otherwise it is called branched. The class of non-branched fusenes will be denoted by F . Those definitions are naturally inherited by benzenoids. The class of catacondensed benzenoids and non-branched benzenoids will be denoted, respectively, by B * and B . In this paper, we restrict to catacondensed benzenoids when using the terms branched and non-branched.

Boundary-edges code revisited
Each fusene can be assigned a boundary-edges code (BEC), a sequence of numbers counting the number of boundary edges between two vertices of degree 3, following the perimeter in an arbitrary, say counter-clockwise, direction. This useful tool to describe a benzenoid was introduced by P. Hansen and his co-workers [31]. The code depends on the starting vertex and the chosen direction. However, it can be made unique by choosing the lexicographically maximal code among all possible codes which is often called the canonical code. Each benzenoid can be uniquely described by such boundary-edges code, but this does not hold for fusenes [29]. Benzene is an exceptional benzenoid as it is the only benzenoid with no vertex of degree 3. If need be, we assign the code 6 to benzene. In the present paper the (lexicographically maximal) boundary-edges code of B will be denoted by code(B).
Here, we take a different approach and start from the definition of a code: A code is a string over the alphabet {1, 2, 3, 4, 5}.
Note that we permit codes that are not boundary-edges codes of any benzenoid (or fusene). By c ⊕ d we denote concatenation of codes c and d, e.g.
Moreover, σ i (c) for i ≥ 0 denotes the right circular shift of code c by i positions, e.g. By ρ(c) we denote the reverse of c, e.g.
Note that ρ 2 (c) = c and σ i σ −i (c) = c for every code c. We will use some properties of codes.  No simple way is known to check whether a given code is the boundary-edges code of some benzenoid. However, there is an obvious necessary condition. Proof. The proof proceeds by induction on the number of hexagons, h.
It is known that any benzenoid B with h hexagons can be obtained from some benzenoid B with h − 1 hexagons by adding a new hexagon using either one-, two-, three-, four-or fivecontact addition, where k-contact implies that k edges of the new hexagon are identified with k consecutive edges of B (see [30, pp. 12-13]). Let code(B) = c and code(B ) = c . Assume that win(c ) = 6.
If B is obtained by one-contact addition then c can be obtained from c by replacing the symbol s (the one that corresponds to the part of the boundary where the new hexagon was attached) with s 1 5s 2 where s 1 + s 2 = s − 1. Then win(c) = (sum(c ) + 5 − 1) − 2(len(c ) + 2) = win(c ) = 6.
Analogous arguments can be used for other types of addition.

Graph invariants
A graph invariant is a function from a class of graphs to a class of values (e.g. integers, real numbers, polynomials) that takes the same value for any two isomorphic graphs. Graph invariants may be categorised by codomains of the functions that define them. When the codomain is the Boolean domain, they are called graph properties. (For example, a graph can either be bipartite or non-bipartite.) Numerous integer invariants exist for graphs: order, size, diameter, girth, genus, chromatic number, etc. Perhaps the most well-known integer invariant in chemical graph theory is the Wiener index [36]. An example of a real number invariant is the Estrada index [17]. In the literature one can find thousands of graph invariants.

Convex benzenoids
In 2012, a special sub-family of benzenoids, called convex benzenoids, was introduced by Cruz, Gutman and Rada [11]. This family was further studied and enumerated in [5]. A convex benzenoid can be characterised via its boundary-edges code.
Definition 3.1. Benzenoid B is convex if its boundary-edges code contains no 1.
The above statement is Proposition 3 in [5]. Since this is one possible characterisation of convex benzenoids we may use it as a definition here.
We note in passing that for infinite benzenoids the situation is more complex. As shown in [4] infinite benzenoids may have more than one boundary component and may need several infinite codes for its description. Sometimes the code does not describe an infinite benzenoid up to isomorphism. An example of such an infinite convex benzenoid that is not determined by its boundary-edges code is called a strip in [4]. Strips of different width have the same boundary-edges code. Hence, in this paper we focus mainly on finite benzenoids.

Convexity deficit
Both convex and non-convex benzenoids play important and sometimes distinct roles in organic chemistry. For example, in the simplest case of benzenoid isomers, convex anthracene comprising three linearly fused hexagons is less stable than non-convex phenanthrene (see Figure 2). In qualitative theories, this is variously attributed to the larger number of Kekulé structures in phenanthrene (5 vs. 4), its higher Fries number (3 vs. 2) or its higher Clar number (2 vs. 1), all of which are inextricably linked to its angular, non-convex shape. We think it will be useful to introduce a measure that will tell us by how much the shape of benzenoid departs from convexity. We call this measure the convexity deficit.  Note that in the first formula d ⊆ c denotes any subcode consisting of cyclically consecutive symbols of code c. Note that convexity deficit generalises the notion of convexity for benzenoids. Clearly, cd(B) = 0 is equivalent to saying that there is no 1 in the code.  Hence any (finite) benzenoid is k-convex at least for k = len(c) − 1.
We note that for infinite benzenoids there exists no upper bound on the convexity deficit. Proof. An example is the infinite benzenoid with boundary-edges code . . . 2221222 . . . shown in Figure 3. It is the complement of the anvil AN [4].

Quasi-convex and pseudo-convex benzenoids
Now we turn our attention to the non-convex benzenoids that are closest to convex, i.e. the benzenoids with the next smallest convexity deficit.
Note that quasi-convex benzenoids admit a simple characterisation via the boundary-edges code.
Proposition 3.4. A benzenoid is quasi-convex if and only if its boundary-edges code contains at least one 1 but no sub-sequence 11, 12, or 21.
Proof. A quasi-convex benzenoid is not convex, hence its code contains a 1. Let a and b be two cyclically consecutive numbers in the code of this benzenoid. Since it is 1-convex, a + b ≥ 4, hence 11, 12, and 21 are forbidden. The converse also follows.
Convex benzenoids can be classified into families with a common fundamental shape, where zig-zag (2 k ) sub-sequences define the edges of the shape. Similarly, all quasi-convex benzenoids have a fundamental shape where the edges are defined by either zig-zag or armchair (1(31) k ) sub-sequences. The fundamental shape of a convex benzenoid has at most 6 edges; for a quasiconvex benzenoid it has at most 12. Zig-zag and airmchair termination have consequences for stability of benzenoids [21] and conductivity of nanotubes [22]. A quasi-convex benzenoid that has no zig-zag sub-sequences in its boundary-edges code will be called pseudo-convex.
Definition 3.4. A benzenoid whose boundary-edges code contains at least one 1 but no subsequence 11 and 2 is called pseudo-convex.
Proposition 3.5. Every pseudo-convex benzenoid is quasi-convex but the converse is not true.
Here are some small examples. Note that naphthalene 55 is convex, phenanthrene 5351 is pseudo-convex and benzo[a]pyrene 513432 is quasi-convex (but not pseudo-convex). A smaller example of such a benzenoid is described by 52441. They are shown in Figures 4 and 5. The BEC 52441 applies to the "pistol" polyhex [25], named for its shape. BEC apply equally to benzenoids and polyhexes.   We have developed software that transforms the boundary edges code to the description of a benzenoid via position of its hexagons in the hexagonal tesselation of the plane, as well as a tool that can draw the corresponding benzenoid. We can also compute several parameters such as convexity deficit (of course, convexity deficit is obtained directly from the BEC). We present computational results in Tables 1 and 2. One is a table of small benzenoids, together with their names and basic properties. The other lists some of the infinite families of benzenoids. Some of the representatives of families presented in Table 2 are depicted in Figure 6.    Figure 6: Examples of the families defined in Table 2.

Extremal convexity deficit
Clearly benzenoids with a small number of hexagons cannot have large convexity deficit. For instance, benzenoids with up to 5 hexagons have convexity deficit at most 3. Call each benzenoid attaining cd(h) extremal, and ex(h) the number of extremal benzenoids with h hexagons.
We performed a computer search to find extremal benzenoids among all benzenoids with h hexagons for all h ≤ 18. The results are summarised in Table 3. In particular, we noticed that only one extremal benzenoid is pericondensed. It has h = 6 hexagons and can be described by the boundary-edges code 533244111 and is depicted in Figure 7(a). Moreover, all other extremal benzenoids with h ≤ 6 are unbranched. There is a unique smallest branched extremal benzenoid having h = 7. It has boundary-edges code 523315151112 and is depicted in Figure 7(b). Note that the spiral benzenoid S(h) attains the maximum value of convexity deficit among all unbranched catacondensed benzenoids. For h ≥ 14, it appears that all extremal benzenoids are branched. We were able to find an interesting family of benzenoids, one member for each number of hexagons. We call them spiral benzenoids.   It is easy to find the extremal fusenes in the subclass F . A small example is [6]helicene in Figure 1(d). Proof. Each unbranched catacondensed fusene can be described with a boundary-edges code 5s 1 s 2 . . . s h−2 5s 1s2 . . .s h−2 , where s i +s i = 4 for all 1 ≤ i ≤ h − 2. It is clear that by setting s i = 1 for all i = 1, . . . , h − 2 we will obtain one of the fusenes, let us denote it by F , with the longest possible subcode c such that sum(c) len(c) < 2. If h is large enough, then code will contain h − 2 symbols 1, symbol 5 and a certain number, let us denote it by , of symbols 3. We are looking for the largest possible such that  Equation (1) is equivalent to when h + > 3 (this holds for large enough h since ≥ 0). From Equation (2) it follows that we can take = h − 6. This is valid if h ≥ 6 and the convexity deficit equals (h − 2) + 1 + (h − 6) = 2h − 7.
If h < 6, then there are only 4 fusenes to analyse. By manual inspection we can see that the convexity deficit in each case is h − 2. Proof. First, observe that the code 533323 is a subcode of S(h) for all h ≥ 7.
For contradiction, suppose that there exists an unbranched benzenoid B such that cd(B) = 2h − 7. Let c be the code which remains when the maximal subcode d, for which sum(d) len(d) < 2, is erased from code(B). We have len(c) = 5 and 4h + 2 − sum(c) 2h − 7 < 2.
If h is large enough, we obtain sum(c) > 16. The code c contains 5 symbols, each of which is an element of the set {1, 2, 3, 5}. One of the symbols must be 5 (otherwise the sum can not be greater than 16). Also, the code cannot contain symbol 5 twice if the benzenoid is large enough (their corresponding hexagons are located at the opposite ends of the chain). From sum(c) > 16 it follows that all the other symbols have to be 3. The code 353 can not be a subcode of a benzenoid due to geometric restrictions. The only remaining option is c = 53333, which again can not be a subcode of a benzenoid, a contradiction. Therefore, S(h) attains the maximal convexity deficit among all unbranched benzenoids on h hexagons.

Conclusion
In this contribution we have briefly revisited several families of benzenoids that have been studied in the past. Most of them are taken from the book by Cyvin and Gutman [15, p. 62].
Here they are defined rigorously by the boundary-edges code instead of relying on pictorial representations. We considered extremal benzenoids with respect to convexity deficit. Table 3 summarises all small cases up to 18 hexagons. BECs of these benzenoids are stored in [6]. We observed from these data that some clear patterns emerged.
In particular, let F (h, k) denote the number of benzenoids on h hexagons having convexity deficit equal to k. Note that F (h, mcd(h)) = ex(h) and F (h, k) = 0 for all k > mcd(h). The motivation for the previous statement comes from computational investigation for small values of h.
Our empirical studies show an interesting picture of extremal benzenoids. It seems that: (1) there is only one extremal benzenoid that is pericondensed; (2) there are only finitely many extremal unbranched pericondensed benzenoids; (3) all extremal benzenoids for h ≥ 14 are branched; (4) there is no upper bound on the number of branched points of extremal benzenoids when h tends to infinity.
These observations could be formulated as conjectures and are a subject of further research.