A dynamic model of CT scans for quantifying doubling time of ground glass opacities using histogram analysis.

We quantify a recent five-category CT histogram based classification of ground glass opacities using a dynamic mathematical model for the spatial-temporal evolution of malignant nodules. Our mathematical model takes the form of a spatially structured partial differential equation with a logistic crowding term. We present the results of extensive simulations and validate our model using patient data obtained from clinical CT images from patients with benign and malignant lesions.


1.
Introduction. Non-small-cell lung carcinomas (NSCLC) are the most common epithelial lung cancers. The development of thin slice CT (computed tomography) scans, coupled with new recommendations for lung cancer screening in high risk patients, has led to increased detection of subtle pulmonary subsolid or nonsolid nodules in the lungs [13]. CT scan x-rays measure these nodules, also known as ground glass opacities (GGOs), as the partial filling of air spaces in the lungs by exuded fluids. Published recommendations [4], [20], [21], [22] for how to follow GGOs over time depends only on nodule size and the presence or absence of a solid component. Recent work has demonstrated the utility of volumetric CT (vCT) for diagnosis of cancer in solid nodules by measuring growth rate over time. For these cancers, which include adenocarcinoma, a growth rate given by a volume doubling time (DT) less than 400 days is predictive of malignancy [12]. However, GGOs often grow slowly in size, thus giving a high false negative rate when using nodule volume as the imaging parameter. Additionally, GGOs can be difficult to segment on CT, making assessment of growth using vCT problematic. In this work, we investigate the potential to assess GGO growth based on a quantitative change in its 3-D density histogram, irrespective of the nodule size or presence of a solid component.
A recent report correlated five categories of CT histogram with histopathological characteristics and recurrence-free survival times [15]. Our objective is to model these five qualitative GGO measurement histogram categories, and their interpretations of tumor progression, to a quantitative dynamic mathematical model of tumor growth, which also allows estimation of tumor DT.
The mathematical model we use for the spatial-temporal evolution of a GGO is a diffusive logistic partial differential equation. We assume cell mass grows almost exponentially in an early time phase from an initial condition consisting of a small nodule, but ultimately slows in growth as time advances. CT scans are quantified in Hounsfield units (HU ), which measure radio-density. Since HU reflect tissue density as the partial filling of air spaces in the lungs by exuded fluids, it is possible for the tumor to increase in density without increasing in physical size on CT, by tumor cells gradually filling in available lung air space (see Figure 1). Therefore, visually observed CT scans may show boundaries of the tumor that do not change for a considerable amount of time, but which may increase in density. This change is reflected in the CT histogram; hence, it may be possible to quantify tumor growth based on subtle changes in the CT histogram.
We identify the five histogram categories formulated in [15], which are based on qualitative HU histogram signatures, with the outputs of our mathematical model, which is based on the time dependent spatial density u(t, x) of tumor cells in a spatial region Ω of the lung. The identification at a given time t is based on the fraction of values of CT scan histogram output and model output u(t, x) in specified subintervals of [0, 1]. The growth and diffusion parameters in the model equation are used to identify the connection over a time series of the two outputs. The model output is then used to identify the DT values for the time series of CT scan histograms. We illustrate the usability of the model for diagnosis of lung cancer with its comparison to CT lung scan data through four clinical patient case studies.
The Tennessee Valley Healthcare System VA Hospital Institutional Review Board approved the analysis of the anonymized CT scan data used in this paper and waived the need for informed consent.

Materials and methods.
2.1. Mathematical model. It has been recently documented that spatial intratumor heterogeneity plays an important role in lung cancer development at both the micro-molecular and at the macro-visible level [4], [20]. At the microscopic level and at the early stages of pulmonary adenocarcinoma in situ (previously bronchioloalveolar cell carcinoma), cancer cells align along alveolar walls in a so-called lepidic pattern. As the tumor invades the air spaces, it becomes more dense on CT.
Mathematical models of tumor growth in spatial regions have been developed by many researchers, including [1], [3], [10], [11], [17], [18], [24], [25]. Many mathematical models have been designed specifically to connect to CT scan imaging, including [2], [5], [8], [9], [16], [26], [27]. Our goal is to develop a mathematical model that aids lung CT scan analysis, and therefore our model captures tumor spatial growth dynamics at the macro-visible level. Our model has the following form of a diffusion partial differential equation with a growth-limiting logistic term: In the model above u(t, x) stands for the density of tumor cells at time t and spatial position x ∈ Ω, where Ω ⊂ R 2 is the observed physical area of the lung (typically a rectangular or disc-like area). We focus here on the 2-dimensional case, which exhibits the essential features of the underlying dynamics of lung tumor growth, and is also comparable to the clinical appearance of CT scan patient data. The 2-dimensional region Ω ⊂ R 2 can be viewed as representative thin slice of the tumor nodule. In future work we will consider a domain Ω ⊂ R 3 , which is more realistic, but with numerical simulations much more time consuming. Note that for simplicity we imposed Dirichlet boundary condition in our model (1)- (2). This is because we are interested in the short term behavior of solutions, with initial tumor cell distributions supported at (or near) the center of the domain. The parameter a is the logistic growth rate. The parameter b is the diffusion coefficient, which determines the speed of spatial tumor propagation. In the examples the parameters a and b were qualitatively determined to match the CT histograms and the model outputs for each patient. In future work we will use formal optimization methods to specify these parameters. The initial conditions u 0 (x) were determined by random choice of clusters of Gaussians and then chosen for compatibility with the CT data. In future work we will develop formal methods for assigning these initial conditions specifically to patient data.
The maximum of u(t, x) at any x ∈ Ω is u m − u b (x). The units of u(t, x) are density units of tumor mass per unit area, which we scale to allow comparison with CT scan GGO) HU values. We take the carrying capacity parameter u m = 1100 as the maximum cell density at any location. We convert u(t, x) units into CT scan HU by subtracting 1000. Most body soft tissue has HU values somewhere between water (HU =0) and blood (HU ≈ 50) due to the high iron content in blood; hence the upper limit of our histogram scale of +100 (or 1100 on the u(t,x) scale). Thus, u(t, x) values range between 0 and 1100, corresponding to HU values between −1000 and 100.
In the subsequent section when we present our simulation results, the cell density u(t, x) will be compared to histograms represented in HU , which are volume averaged values of mixtures of air and water, HU air = 1000, HU water = 0 [14]. Normal lung histograms are centered around HU = −750, reflecting about 75% air. As a tumor grows, more tissue density (water) displaces the air and typically shifts the histogram to the right. Note that Hounsfield units are integer valued.
The spatial growth of the tumor in model (1)-(2) is limited by the normal background lung cell distribution, denoted by the time-independent background density function u b (x). Tumor growth concentrates in micro-environmental regions of lung tissue vascularization [4], where the background density u b (x) is higher. The initial values and parameters are qualitatively fitted to each patient CT scan histogram data. In future work, formal optimization procedures will be developed for quantitative fittings based on large numbers of patient data.
The proof of existence of unique solutions of model (1)-(2) is provided in the Appendix. Note that, here we are mainly interested in the early transient behavior of solutions of the model, and not the long-term asymptotic behavior.

2.2.
CT scan histogram categories. The five CT scan histogram categories presented in [15] are summarised in Table 1 below. The classification of these categories is qualitative and subject to interpretation. The classifications of patient examples in [15] were each constructed by visual assessment of two expert observers, using a decision tree algorithm, with disagreements resolved by consensus. The histograms in the study in [15] were given in terms of continuous smoothed-out renderings of the histogram bar graphs, which allowed easier determination of category type. In our study we use actual histogram bar graphs, which preserve more information. In general, the classification of category for a given patient data set is necessarily subjective, and in fact, some patient data in our database do not readily fit any of the classifications. Our main goal is to construct a model that fits patient CT scan histogram data, rather than a model that fits the interpretation of these data according to the classification scheme in Kawata et al. We believe that our model simulations will aid in the designation of these categories for individual patients. To compare model output to patient data for a time series of CT scan histograms for a given patient, we will use a quantitative determination of the fractions of both CT scan histogram outputs and model (1)-(2) outputs. The CT scan fractional histogram outputs are the fractions of histogram bar heights in a given range of HU . The fractional model outputs are the integrals of the density function u(t, x) over a given range of values, divided by the integral of all values of the density function u(t, x), with both integrals over all of Ω. We assign three output ranges as presented in Table 2 below (other choices are also possible).
We use the output fractions f 1 , f 2 , f 3 to compare a time series of CT scan histograms for a given patient with the model (1)-(2) by specifying a time-independent background density u b (x), an initial condition u 0 (x) (corresponding to the baseline histogram), the logistic parameter a, and the diffusion parameter b. We then calculate the doubling time DT of the tumor from the model output.
3. Results. We provide here the results of simulations for four case studies, all compared to patient data. Our patient data and model simulation codes (developed in MATHEMATICA) are available upon request to the authors. All histograms, for both CT scan data and model simulations, are constructed with binning width of 10 HU per bin. We note that for each simulation the initial density u 0 (x) is formulated as a 2-dimensional Gaussian, and the background density u b (x) is formulated as an array of 2-dimensional Gaussians, which are parameterized so that the histogram of u 0 (x) + u b (x) corresponds approximately to the baseline CT histogram in each simulation. These inputs are viewed as representative of the tumor at the macrolevel. In future work these inputs will be formulated at the micro-level as in Figure  1, which requires much greater detail and much more extensive computing resources for running the simulations.
3.1. Patient 1. Patient 1 is an example of a biopsy proven benign GGO. In Figure  2 we show CT scan images for Patient 1 at five time points. Patient 1 data consists of CT scan histograms in a series of five time points over approximately two years. These five histograms, with their category type and fractional values f 1 , f 2 , f 3 , are given in Figure 3. For the model simulation of Patient 1, we have taken the time points (in days) as t 0 = 0, t 1 = 87, t 2 = 228, t 3 = 643, and t 4 = 692, corresponding to the dates in Figure 3. In Figure 4 we graph the initial tumor spatial density plus background density u 0 (x) + u b (x), in alignment with the CT scan histogram at baseline t = 0, shown in Figure 3, and the tumor spatial density u(t, x) at t = 692.
In Figure 5 we graph the histogram plots (with bin width 10) of the model simulation of Patient 1 at the five time points as in Figure 3, where the values of u(t, x) are shifted by −1000 to correspond to HU . The category type and fractional values f 1 , f 2 , f 3 at each time point are given in the Figure 5 legend. The histograms in Figure 3 and Figure 5 show relatively good alignment, all with type β. The histogram fractions for the CT scan data and the model simulations are compared in Figure 6. From these histogram plots we see that the tumor does not progress in category type. The parameters for Patient 1 and the doubling time obtained from the simulation are shown in Table 3. 3.2. Patient 2. Patient 2 is an example of a benign GGO nodule. In Figure 7 we show CT scan images for Patient 2 at six time points. Patient 2 CT scan histograms at six time points, their category type, and fractional values f 1 , f 2 , f 3 , are given in Figure 8. For the model simulation of Patient 2, we have taken the six time points (in days) as t 0 = 0, t 1 = 107, t 2 = 198, t 3 = 386, t 4 = 568, and t 5 = 932 corresponding to the dates in Figure 8. In Figure 9 we graph the initial tumor spatial density plus background density u 0 (x) + u b (x), in alignment with the CT scan histogram at baseline t = 0, shown in Figure 8, and the tumor spatial density u(t, x) at t = 932.
In Figure 10 we show the histogram plots (with bin width 10) of the model simulation of Patient 2 at the six time points as in Figure 8, where the values of u(t, x) are shifted again by −1000 so that they correspond to HU . The category type and fractional values f 1 , f 2 , f 3 at each time point are given in the Figure 10 legend. The histograms in Figure 8 and Figure 10 show relatively good alignment, all with type β. The histogram fractions for the CT scan data and the model simulations are compared in Figure 11. From these histogram plots we see that the tumor does not progress in category type. The parameters for Patient 2 and the doubling times obtained from the simulation are shown in Table 3. The doubling time obtained from the simulation is in the range of a benign nodule.
3.3. Patient 3. Patient 3 is an example of atypical cells highly suspicious for adenocarcinoma by biopsy. In Figure 12 we show CT scan images for Patient 3 at four time points. Patient 3 CT scan histograms (with bin width 10 HU ) at the four time points, with their category type and fractional values f 1 , f 2 , f 3 are shown in Figure 13. For the model simulation of Patient 3, we have taken the time points (in days) as t 0 = 0, t 1 = 574, t 2 = 826, t 3 = 917 (corresponding to the dates in Figure 13), and two additional time points beyond the data times as t 4 = 1, 217 and t 5 = 1, 517. In Figure 14 we graph the initial tumor spatial density plus background density u 0 (x) + u b (x), in alignment with the CT scan histogram at baseline t = 0, shown in Figure 13, and the tumor spatial density u(t, x) at t = 917.
In Figure 15 we show the histogram plots of the model simulation for Patient 3 at the six time points as shown in Figure 13, where the values of u(t, x) are shifted by −1000 to correspond to HU . The category type and fractional values f 1 , f 2 , f 3 at each time point are given in the Figure 15 legend. The first four histograms in Figure 13 and Figure 15 show relatively good alignment, with type progression from β to γ. Through the two additional time points in the simulation we see the progression of the tumor through type γ. The histogram fractions for the CT scan data and the model simulations are compared in Figure 16. The parameters for Patient 3 and the doubling time obtained from the simulation are shown in Table  3. The doubling time obtained from the simulation is in the range of non-small cell lung cancer.
3.4. Patient 4. Patient 4 is an example of a proven adenocarcinoma that started as a GGO that increased in density on CT over time. In Figure 17 we show CT scan images for Patient 4 at four time points. Patient 4 CT scan histograms (with bin width 10 HU ) at four time points, with their category type and fractional values f 1 , f 2 , f 3 are shown in Figure 18. For the model simulation of Patient 4, we have taken the time points (in days) as t 0 = 0, t 1 = 239, t 2 = 423, t 3 = 471, and two additional time points beyond the data times as t 4 = 600 and t 5 = 750. In Figure  19 we show the graph of the initial spatial density u 0 (x), the background spatial density u b (x), and their sum, in alignment with the CT scan histogram at baseline t = 0 shown in Figure 18.
In Figure 20 we show the histogram plots of the model simulation for Patient 4 at the four time points as shown in Figure 18, where the values of u(t, x) are shifted by −1000 to correspond to HU . The category type and fractional values f 1 , f 2 , f 3 at each time point are given in the Figure 20 legend. The first four histograms in Figure 17 and Figure 20 show relatively good alignment, with type progression from β to γ. The histogram fractions for the CT scan data and the model simulations are compared in Figure 21. Through the two additional time points we see the progression of the tumor through type δ to type . The parameters for Patient 4 and the doubling time obtained from the simulation are shown in Table 3. The doubling time obtained from the simulation is in the range of non-small cell lung cancer. In Figure 22 we graph the total tumor mass from the model simulations for each patient over time, where mass is scaled to 1.0 at time 0. Patients 1 and 2 have smaller growth than Patients 3 and 4, corresponding to their smaller growth parameter a. The model simulations allow calculation of the tumor doubling times, as well as tracking of the tumor growth over the span of CT scan time series. The model simulations also allow growth projections for additional times beyond the CT scan data, as demonstrated for Patients 3 and 4 (see also Figure 22). 4. Discussion. In a recent paper [15], a qualitative five-category classification method was proposed for analyzing NSCLC, and its utility justified using statistical tools. The results indicated a satisfactory inter-observer agreement simply through visual assessment of CT histograms. Our goal here has been to quantify the five categories in [15] in terms of a dynamic spatial model of tumor growth; and to connect the temporal dynamics of the categories to tumor DT. We have compared CT scan data and model outputs for four patient studies. For each patient, we see good agreement between these data and model outputs, in terms histogram categories and HU fractional ranges.
In the current work we hypothesized that the five categories identified in [15] actually correspond to temporal tumor progression. Indeed, Kawata [15] already speculated that change from type α to β and from β to γ may indicate tumor progression.
Our results show that model (1)-(2) supports the five category classification in adenocarcinoma in situ. Further, these five categories can be viewed as a hypothesized 5-step lung cancer progression theory. Moreover, since it takes into account the spatial heterogeneity of the tumor, which is particularly important for irregular nodules investigated here, the model gives us a tool to estimate tumor mass doubling times using CT histogram data only.
Major challenges for application of the model (1)-(2)are the identifications of the initial tumor nodule characteristics, the background non-tumor bias parameter u b (x), the carrying capacity parameter u m , the spatial diffusion parameter b, and tumor growth parameter a. Our goal here has been to demonstrate that model (1)-(2) does correlate well with tumor growth data given by CT scan data represented with GGO histograms. Formal procedures to quantify these identifications of initial data and parameters for general patient data will be carried out in future work.

5.
Outlook. Our model already shows very good agreement with patient data, and the 5-category classification of GGOs. Future improvements of the mathematical model may involve: • Full 3-dimensional simulations.
• Systematically analyze the simulation outcomes as functions of the model parameters and initial condition (transient vs asymptotic behavior, is there a globally stable steady state?).
• Inclusion of nonlinear diffusion to account for a more realistic description of tumor spatial growth (in particular to model competition effects).
• To include different type of placement processes for the tumor cells (other than diffusion) to account for the complex spatial structure of the lung.
Estimation of tumor doubling time in GGOs has not been described. This work offers a method to compute growth rate of GGOs as a predictive biomarker of malignancy, similar to that used for solid nodules using volumetric CT. Further work is needed to investigate the impact of different reconstruction algorithms and reconstructed image quality on the estimate of GGO growth rate.
6. Supporting information. Global behaviour of solutions. The basic mathematical theory of general classes of nonlinear reaction diffusion equations of the type (1)-(2) is well understood. However, for completeness, here we provide a concise proof of the global existence and positivity of solutions of our model in the biologically relevant state space of Lebesgue integrable functions L 1 (Ω) =: X . In particular, to establish the global existence of mild solutions we implement a framework as in [23] for a structured population model, see also [19].
We set K := L 1 + (Ω) (the positive cone of L 1 , which is closed) and we recast model (1)-(2) in the form of a semilinear abstract Cauchy problem as follows. where We say that the abstract semilinear problem (3) satisfies the sub-tangential condition (see e.g. [23]) with respect to K, if where T is the linear semigroup generated by A, and d is the usual distance function. We also recall the notation (·, ·) − introduced for a semi-inner product on X . Below X * denotes the dual of the Banach space X , and (·, ·) the natural pairing between elements of X and X * .
We recall the following result from [23], see also [19]. Let X , K, A and F as defined above, and assume that F is locally Lipschitz and bounded. Further assume that the sub-tangential condition (7) holds, and that there exist ω, κ ∈ R such that (A u, u) − ≤ ω|u| 2 , for all u ∈ D(A); and (F (u), u) − ≤ κ|u| 2 , for all u ∈ K. Then, for each u 0 ∈ K, there exists a unique mild solution u(t) to (3) for all t > 0.
We now apply this result to our model (1)-(2).
Proof. It follows from the assumptions that the densely defined operator A defined in (4)-(5) generates a positive strongly continuous semigroup T (t) on L 1 (Ω). Note that the nonlinear operator F cannot be defined on the whole state space X , but F is locally Lipschitz and maps bounded sets B ⊂ K into bounded sets F (B). To establish the global existence of solutions, note that in our situation since T leaves K invariant, the sub-tangential condition (7) simplifies as follows (see also Lemma C in [23]).
which is easily seen to hold true, as for all u ∈ K we have F (u) < 0. Next note that in our setting we have (9) Hence for every u ∈ K we may take u * ≡ ||u|| 1 = Ω u (constant function), which shows that (F (u), u) − ≤ 0 holds. Finally, note that (A u, u) − ≤ ω|u| 2 , for all u ∈ D(A) holds with ω := s(A) < ∞, the spectral bound of A.
Our model (1)-(2) always admits the trivial steady state u * ≡ 0. For a large enough, the existence of a strictly positive steady state can be established using the general framework developed in [6]. In particular, we can define a parametrized family of linear operators as follows: It is then shown that for a large enough, s(Φ 0 ) > 0 holds, and that the function defined as f : α → s (Φ α v ) is monotone decreasing for every v ∈ K. This then allows one to define a fixed point map on the level set S := {v ∈ K | s(Φ v ) = 0}, which yields the existence of a positive steady state of (1)-(2), see [6] for more details.
We also note that applying earlier results by Cantrell and Cosner from [7] (see in particular Theorem 3.1 in [7]) would also allow us to obtain sufficient conditions for the existence of a globally stable unique positive steady state for a large enough.       u(0, x, y). B: the initial spatial density of the tumor plus the background spatial density u(0, x, y) + u b (x, y). C: the tumor spatial density u(692, x, y) at time t = 692 days.