PROBABILITY OF ESCHERICHIA COLI CONTAMINATION SPREAD IN GROUND BEEF PRODUCTION

. Human illness due to contamination of food by pathogenic strains of Escherichia coli is a serious public health concern and can cause signiﬁcant economic losses in the food industry. Recent outbreaks of such illness sourced from ground beef production motivates the work in this paper. Most ground beef is produced in large facilities where many carcasses are butchered and various pieces of them are ground together in sequential batches. Assuming that the source of contamination is a single carcass and that downstream from the production facility ground beef from a particular batch has been identiﬁed as contaminated by E. coli , the probability that previous and subsequent batches are also contaminated is modelled. This model may help the beef industry to identify the likelihood of contamination in other batches and potentially save money by not needing to cook or recall unaﬀected batches of ground beef.

1. Introduction. The bacterium Escherichia coli is commonly found in the intestine of warm-blooded organisms. Most of its strains are harmless and are a beneficial part of the gut fauna. Pathogenic E. coli strains, in particular O157:H7, cause illness in humans. Scallan et al. [12] indicate that a significant part of all cases of acquired food-borne illness in the U.S.A. is caused by the pathogenic strains of E. coli. These strains are primarily transmitted from cattle to humans by consumption of meat and especially under-cooked ground beef, although infection can also occur after consumption of dairy products [15,16] or from other sources.
In the western world, most meat production is concentrated in large meat processing plants, and any outbreak of E. coli contamination may affect many people over a wide area. In addition to the health hazard, outbreaks cause large economic losses and have a negative impact on the beef industry. The existing food safety regulations in Canada and the U.S.A. require the removal of all production and raw sources associated with an identified contamination event. This leads to the recalling of large amounts of beef, much of it likely uncontaminated, and consequently at which production occurs, it is assumed that the contaminant does not grow appreciably.
Ground beef is produced in batches. Each batch has input from several raw sources, typically one or more "lean" fresh sources and "fat" fresh sources, but also often frozen sources and other sources such as Boneless Lean Beef Trimmings (BLBT), also called Lean Finely Textured Beef (LFTB), which is extracted from trimmings via a centrifuge at temperatures around 38 C. BLBT is usually free of bacterial contamination since it is typically treated to kill bacteria before being used. The meat in a batch is well-mixed and ground together, so that, if any contamination is present on any of the raw source material that is input to the batch, the entire batch is deemed to be contaminated.
As a carcass is processed, parts of it are trimmed off and put into raw source bins to be used as input to the ground beef batches. Typically a carcass gets spread over a region in the raw source. The size of this region and the probability of a piece of the carcass being present at any point in that region is highly dependent on the production process. If carcasses are spread across sources, measures of carcass overlap between any two sources are necessary for the model. Some simplifying assumptions are made regarding how carcasses are spread in a particular raw source and how a raw source is used in ground beef production, as illustrated in Figure 1: • For each raw source, every carcass is the same. The number of pieces contributed by each carcass, the various masses of those pieces, and the manner of the spread of those pieces throughout a region of the raw source, are the same for each carcass present in the raw source. • Material within a raw source is ordered and used as input to the ground beef production batches in that order. • The carcasses are sequentially processed. The regions of a raw source through which pieces from sequential carcasses are spread overlap one another but are shifted forward in the ordering. (Boundary effects for carcasses near the beginning and end of the raw source do alter the spread distributions of pieces from these carcasses; see Section 2.2.) The second assumption allows us to define a "mass location" in each raw source; mass from a particular raw source used in batch number b comes from locations just prior to those for the mass from the same source used in batch number b + 1.
In other words, the carcass pieces are points and the production geometry is linear. Although these assumptions are likely not valid for a real production facility, they are useful simplifying assumptions for the modelling process. The likely mixing that occurs within the raw source can be partly accommodated by the carcass piece distribution that is adopted in the model. If there is heterogeneity of the carcasses within the source and this aspect was deemed important, then detailed information about the character of this heterogeneity would be required before it could be built into the model. Suppose there are B batches of ground beef produced in a production cycle and S raw sources used as input to these batches. Not all sources need be used in all batches. It is assumed that a particular batch of ground beef has been identified as being contaminated. This batch is referred to as the "hot" batch, and it is identified as batch number h. The origin of this contamination is due to a single hot raw source and in turn, to a single hot carcass in this raw source. Material from the raw sources is sequentially input to consecutive batches of ground beef.
The following subsections describe in detail how carcasses are spread in a raw source, which determines their likelihood of appearing in a given batch of ground beef, and how, given a particular hot batch h, the model computes 1) the probability that any particular source is the hot source, 2) the probability that a particular carcass in the hot source is the hot carcass, and 3) the probability that the hot carcass is also present in other batches. These probabilities are then combined in Equation (13), at the end of this section, to determine the probability that other batches are contaminated. Table 1 lists the symbols used in the model.

2.2.
Spread of carcasses in a raw source. It is assumed that each carcass present in a particular raw source s contributes p s pieces and that the average mass of these pieces is a s . Let C s be the total number of carcasses in the raw source, then the total mass in the raw source is M s = C s p s a s . The pieces from each carcass are distributed throughout this raw source as illustrated in Figure 2 and described below.
For all pieces from a particular carcass c, the "centre" of the piece distribution is located in the source at the mass location Thus the carcass centres are spread evenly across the source. Let F s be the common base probability density function for all pieces from all carcasses in raw source s. Then, excluding boundary effects, the probability density function for pieces from carcass c in this raw source would be F s (x − µ c ), where x is the mass location in the source. However, since the source is finite, carcasses with centres near the boundary would typically have F s being nonzero beyond the beginning or end of the source. To prevent this, the distribution function is reflected at the boundaries (x = 0 and x = M s ) and added to the base. Therefore, the probability density function for  Mass from source s input to batches 1 through b − 1.

B sb
Interval of mass locations in source s input to batch b. A sc (B sb ) Probability that carcass c is absent from the set B sb , that is, carcass c contributes no pieces to batch b through source s. f s Fraction of fat in raw source s. g s Relative susceptibility to contamination factor for source s. V s1s2 Fraction of carcasses present in both raw sources s 1 and s 2 .
pieces from carcass c in source s is The first term on the right side of (2) is just the base probability function shifted to the centre, µ c , for this carcass; the second and third terms are the reflections around x = 0 and x = M s , respectively. For carcasses near the middle of the raw source, the reflection terms will not contribute anything unless the breadth of the base probability function F s is very large. In addition, F s (x) is restricted to be zero for |x| > M s , so that a single reflection at each end incorporates all of the support of F s . For each piece from carcass c the probability that it is located between mass locations x 1 and x 2 is More generally, define Q sc (R) for some set R as the integral of G sc (x) over R.
The above assumptions put some restrictions on the density function F s . In particular, the total expected mass of all carcasses in a particular set must add to the size of the set. That is, for any set R ⊆ [0, M s ], Cs c=1 p s a s Q sc (R) = |R| .
A number of functions F s may satisfy this constraint, but here only a class of even (around 0), piece-wise linear functions, where each piece has a length being a multiple of p s a s , is considered. An example of such a function is shown in Figure 2(a). In the following, a technical description of this class of functions is given. A function F s in this class may be defined in terms of parameters K, N , and H as below, where the subscript s on the these parameters has been suppressed for readability: where the N i are integers (number of carcasses) satisfying 0 = N 0 < N 1 < N 2 < · · · < N K ≤ C s , and H ± i ≥ 0, 0 ≤ i ≤ K, are the limiting values of F s at N i p s a s from the left (−) and right (+), respectively. Necessarily, H − 0 = H + 0 and H + K = 0. Also, F s must itself be a valid probability density, hence This is just the statement that the area under F s must equal one. This class of functions includes the uniform distribution, obtained with K = 1, and 2N1psas . If the parameters are chosen such that H − i = H + i , 1 ≤ i ≤ K, then the function F s is continuous. Each raw source s may have a different distribution function F s , and thus a different set of parameters K, N , and H.
The realized distribution of pieces from a particular carcass is dependent on the locations of all other pieces from all carcasses, because piece locations cannot overlap. In particular, the locations of pieces from a given carcass c are not independent selections from the density function G sc (x), nor are they only dependent on previous selections for the same carcass. However, assuming independence greatly simplifies the selection modelling and should introduce minimal error especially when there are a large number of carcasses. Independent selection is therefore employed in this model.

2.3.
Probability of carcass presence in given batches. The probability that a carcass c from source s is input to batch b is computed as follows. The mass in each raw source is labeled according to the sequential batches for which it is used. Let m sb be the mass input from source s to batch b, and let M sb be the mass input from source s to batches prior to batch b, that is, The range of mass locations in raw source s that are input to batch b is then the interval see Figure 3. For source s, denote the probability of carcass c being absent from the set B sb as A sc (B sb ). Since the selection of the p s pieces for carcass c is modelled as being independent of the selection of other pieces, A sc is given by Raw Source s Therefore, the probability of carcass c from source s being present in batch b is Consider two batches, h and j, that receive input from the same source s, and a particular carcass c from that source. The probability that this carcass has at least one piece that is input to batch h and at least one that is input to batch j is The last term is present because it is included in both the previous terms but should only be counted once.

2.4.
Identification of the hot source. The contamination in the hot batch, h, may be due to any of the raw sources that provided input to this batch, and these raw sources may have varying degrees of relative susceptibility to being contaminated. It is known that the contamination is often carried on the fat, so the fraction of fat, f s , in raw source s is an important factor. However, even sources with the same fat content may have differing likelihood of being contaminated. For example, a frozen source may be much less likely to be contaminated than a fresh source. To account for these differences, a relative susceptibility factor, g s , must be assigned to each source by the user. The absolute size of these factors is not important, only their sizes relative to one another. Thus if the user specifies g 1 = 0.5, g 2 = 1.0, and g 3 = 1.5, then source 2 is twice as likely to be contaminated than source 1, source 3 is three times more likely to be contaminated than source 1 and 1.5 times more likely to be contaminated than source 2. The probability that raw source s is the origin of the contamination in batch h is taken as Equation (9) is simply a weighted fractional contribution of mass from source s to the hot batch h. The larger the fraction of mass from source s in batch h, the more likely the contamination came from source s. The weighting is the product of the fat fraction f s and the susceptibility factor g s .

2.5.
Identification of the hot carcass. Given that batch h is the hot batch and assuming the contamination is due to a single hot carcass in raw source s (which contributes mass to batch h) then, with no additional information, the probability that a particular carcass is the hot one is simply the relative probability of that carcass being present in batch h: The right side of the above is computed using (7).
2.6. Presence of the hot carcass in other batches. Assuming that a particular carcass c is the hot carcass and that this carcass contaminated batch h via input from source s, then the final step is to compute the probabilities that this carcass is also present in each of the other batches. It is possible that a particular carcass may be present in more than one raw source. This could occur, for example, in a production facility where certain trim from each carcass is placed in one raw source, and other trim from the same carcass, is placed in another source. Presence of carcass c from source s in another batch j, may therefore be due to either input from the same source or input from pieces of the same carcass present in another source. The model presented here uses a conditional probability of carcass overlap. The probability that carcass c 1 from source s 1 is the same as carcass c 2 from source s 2 , given that c 1 from source s 1 is in the hot batch h, is denoted This probability must be assigned by the user. If sufficiently detailed information on the trimming process were available, this probability could reflect that, giving different values for different carcasses within each raw source. Without such detailed information, a more coarse approximation of this probability can be assigned using a measure of source overlap, V s1s2 , defined as the number of carcasses present in both sources divided by the total number of distinct carcasses in both sources. If n carcasses are present in both sources s 1 and s 2 , then this overlap fraction is which may be re-arranged for n giving The probability of carcass overlap can then be approximated by the probability of selecting one of these n carcasses from source s 1 and then selecting the exact same carcass from source s 2 : This coarse approximation, due to lack of further detail, disregards the condition that carcass c 1 from source s 1 is in batch h.
The probability that carcass c is present in batch j given that carcass c from source s is present in the hot batch h can now be determined. This probability is computed as one minus the product of the probabilities that carcass c is not in batch j via equivalence with any other carcass from any other source: In the last pair of lines of the above equation the factor with i = s is separated out from the i product since its product over k collapses to a single factor. Also the conditional probability rule has been used in this factor. Equation (12) may be computed using Equations (7), (8), and (11).

2.7.
Probability of other batches being contaminated. The probability that batch j is also contaminated given that batch h is contaminated is then found by 1) summing over all raw sources, computing the probability that that source is hot, 2) summing over all carcasses in that source, computing the probability that that carcass is hot, and 3) multiplying by the probability that that carcass is also present in batch j.
3. Results and discussion. The following two examples illustrate the above model. The first is a synthetic full set of data for a fictitious beef processing plant, the second is based on a partial data set from genetic typing experiments done at an actual production plant. The data for this final example are used to provide estimates for the spread of a carcass within a raw source and to inform the model about masses of raw source input to batches. Unfortunately, it is not possible to directly verify the model with data since a processing facility clearly cannot deliberately contaminate their ground beef supply with E. coli. However, further experiments would be beneficial to help determine the spread of carcasses within a raw source, which will depend very much on the processes in place at any given facility.  Table 3. Source input mass, m sb , (kg) and total fat percentage for the synthetic data set. Batch  I  II III IV  V VI VII fat %  1  312  136  552  25  2  384  52  564  25  3  114  404  260 222  25  4  262  239  231 268  25  5  201 205  89  293 212  25  6  320 180  292  100 108  15  7  407 105  284  204  15  8  390  456  154  15  9  300  205 325  170  15  10  209  211 543  37  10  11  293  132 536  39  10  12  318 94  540  48  10  13  479  454  67  10  14  701  226  73  10 3.1. Synthetic data. These data are fictitious but based on typical values one might encounter in a large ground beef production facility. In this example, there are a total of S = 7 sources (three frozen lean, two fresh lean, and two fresh fat), and a total of B = 14 one-tonne batches of ground beef are produced. Relevant information for the sources is provided in Table 2. The carcass overlap fractions between different sources were taken as zero except for the following overlap between the fresh sources:

Source frozen lean fresh lean fresh fat
The uniform distribution for F (K = 1, H ± 0 = H − 1 = 1/(2N 1s p s a s )) was used for all raw sources in this example. The mass from each source used in each batch is provided in Table 3.
Before displaying the final results, we first illustrate some of the intermediate model computations for this example. The probability that a particular source is the hot source given that batch h is hot, is determined by Equation (9). Given the values in Tables 2 and 3, this probability is easily computed and is reported in Table 4 for each source and each batch being the hot batch. As can be seen from this table, the fresh fat sources typically have the highest probability of being the hot source, followed by the fresh lean sources. This is due to both the fact that the contamination factor g s in Table 2 is highest for the fat sources, and also because the fat content f s in these sources is highest. For example, if Batch 11 is the hot batch, the probability that Source VII is the hot raw source from which the contaminated carcass came is above 25% even though Source VII only contributed 39 out of 1,000 kg to the batch. The probability that a particular carcass within a raw source is the contaminated carcass given that batch h is the hot batch is given by Equation (10) and computed using Equations (1)- (7). For this example, using the uniform distribution as indicated, the results of this calculation for several hot batches h and Sources II and VII are shown in Figure 4. Note from this figure that as the hot batch number increases, the carcass numbers with nonzero probability of being the hot carcass also shift to the right. The flat section of each probability curve is due to the fact that the chosen carcass distribution functions for this example are fairly narrow, and hence most of the central mass contributed to the batch from this source has the same probability of being contaminated. Wider curves correspond to batches that take larger amounts of mass from the source. Results for other sources and other hot batches are similarly computed.
The probability that a particular carcass from a particular raw source that is present in batch h is also present in batch j is given by (12). Consider carcass number 100 in Source VI. From Tables 2 and 3 this source consists of 250 carcasses  comprising 2000 kg, about the second quarter of which is input to Batch 2. It is not difficult to show given the piece distribution function, that if there were no overlap of carcasses between sources, this carcass would only be present in Batch 2. However, since Source VI overlaps with Sources IV, V, and VII in this example (V 46 , V 56 , and V 67 are nonzero), there will be nonzero probability of this carcass appearing in other batches. Figure 5 displays this probability. Since V 46 is considerably larger than either V 56 or V 67 , the nonzero probability of carcass 100 from Source VI being present in other batches is larger for the batches to which Source IV contributes significantly, namely Batches 3, 4 and 6 through 11. Using the full model, and letting each batch be the hot batch in turn, probabilities of contamination for each batch were computed. Complete results are given in Table 5 and a selection of these are plotted in Figure 6.
The likelihood of other batches being contaminated is highly dependent on the source input. In the example, if one of the early batches is the hot batch, then the likelihood of contamination is large only for batches near the hot batch. Conversely,  Figure 6. Probability of contamination for each batch given that a fixed batch is contaminated (hot). The five separate curves correspond to Batches 2,5,8,11, and 14 being the hot batch.
if one of the last batches is the hot batch, then a greater number of other batches have large probability of being contaminated. This is due to the source configuration shown in Tables 2 and 3. Since the contamination is most likely to be carried in the fatty sources (VI and VII) followed by the lean fresh sources (IV and V), the distribution of these sources across the batches is a primary contributor to the contamination probability distribution. The other two factors that were found to be very important were the values of N 1s and V s 1 s 2 . If these spread indicators were small, then the contamination was much more confined to nearby batches to the hot batch. The overall result then, is the obvious observation that if one wishes to restrict contamination, then one should restrict the spread of each carcass within the raw source, and restrict the spread of the raw source across batches. In other words, try to make it so that one carcass is present in as few batches as possible.

3.2.
Real data. These data are from genetic typing experiments performed at an industrial ground beef production facility in 2012. The primary goal of these experiments was to see if genetic typing could be used to accurately determine the number of carcasses present in a ground beef batch and to estimate the amount of overlap of genetic material from batch to batch. Samples of beef were taken from the final ground beef product and genetically analyzed to determine the number of different "profiles" (or distinct animals) present. These profiles were then compared against the profiles found from sampling other batches and the number of shared samples was reported. In addition, this data set provides some information on the sources input to each batch but only partial information on mass amounts. The company where these genetic typing experiments were performed has not provided permission to be named and the experimental results are not publicly available, hence these data are provided without reference.
Although these experiments were not designed specifically to help construct and validate this model, they do provide some information that can be utilized. Our model requires a knowledge of the spread of a carcass across a raw source, but the samples for these data were taken from the final ground beef batches, not raw sources. Therefore these data do not distinguish spread differences between raw sources. Nonetheless, the information available is used to estimate the spread of carcasses in all sources assuming all sources have identical spreads.
There were eight raw sources: four frozen lean, two fresh lean, and two fresh fat. In this experiment 30 of 45 sequential batches were sampled and genetically profiled. Table 6 gives the number of distinct genetic profiles in each batch and the number that match profiles from other batches. The data in this table are interpreted as the weighted average of the conditional probability that a particular carcass is present in batch j given that it is present in batch h, the weights being the probability that that carcass is present in h. That is, if T jh is the entry from row j and column h of the table (although the table does not  . (14) The above can be calculated using (7) and (12). Since the data are limited, providing only information on carcass spread through sums of the sources rather than spread in individual sources, in order not to have too many parameters, the spread in all sources is assumed to be described by the same two-piece (K = 2) continuous (H − 1 = H + 1 , H ± 2 = 0) distribution function F . The data indicated that there were four suppliers from whom Source I, Sources II-IV, Sources V-VI, and Sources VII-VIII originated, respectively. For this reason, only source pairs II and III, II and IV, III and IV, V and VI, and VII and VIII were allowed to have overlap of carcasses, and the overlap fraction was assumed the same for II and III, II and IV, and III and IV. The number of pieces, p, and the average piece mass, a, were also taken as the same for each source. The parameters a, p, H, N , and V were chosen so that the computed values from the right side of (14) were as close as possible to the data values from the left side of (14) using a sum of squared differences as the minimization function. These results are shown in Table 7. The best fit to the data Table 6. Number of profiles (on the diagonal) and profile matches between batches.  Table 7. Fit carcass distribution parameters for the combined raw sources. The overlap fractions V 24 and V 34 are equal to V 23 .  indicated that each carcass in a raw source contributes 8 pieces of average size 0.1 lbs. (0.045 kg) to that source, totalling 0.8 lbs (0.364 kg). The carcass distributions are concentrated fairly tightly around their centres, but with long, low tails. Over 94% of the carcass distribution lies within about 26 carcasses (21 lbs., 9.545 kg) on either side of its centre, with the remaining spreading out as far as 6202 carcasses (4,962 lbs., 2,255 kg). These parameters for the carcass distributions were used in all eight of the model's raw sources. The data from Table 6 indicate profile matches between Batches 1-10 and 23-45 even though Table 8 shows there are no common raw sources for these two sets of batches. Consequently, there must be some overlap of carcasses across raw sources. The fitted overlap fractions indicate a substantial carcass overlap in Sources II-IV (three of the frozen lean sources) and between Sources VII and VIII (the two fresh fat sources), but no overlap between Sources V and VI (the two fresh lean sources).
Unfortunately, in the data available to us, only the mass input from each raw source to the first batch was recorded as well as an indication (without mass amounts) of which raw sources were used in subsequent batches from which samples were taken. Based on this, the mass input was estimated as shown in Table 8 for all 45 batches. No further information about the raw sources was provided by this data set even though much of it would have been available to have been recorded. The remaining parameters needed for our model have been estimated (or derived from previous assumptions) and are given in Table 9. Of the remaining parameters, the susceptibility, g s , and the fat content, f s , were given reasonable but arbitrary values and so that the fat content of the ground beef batches was about 10%. The mass of each source, M s , and the number of carcasses in each source were calculated using the assumed mass inputs of Table 8 and the estimated values of p and a from Table 7.
Complete results are given in Tables 10 and 11 and a selection of these are plotted in Figure 7. As can be seen in these tables, the probability of contamination of Table 9. Estimated and derived model parameters for the raw sources.
Source batches immediately adjacent to the hot batch is typically about 10-15%. Batches further away have a decreasing chance of contamination but the probability does not drop to zero until about 8-10 batches away from the hot batch. This is primarily due to the spread distribution of carcasses in the raw sources, which, as noted above, is very concentrated near the centre. The fresh fat source changes from Source VII for batches 1-20 to Source VIII for batches 21-45, as shown in Table 8. Since much of the contamination probability is due to these fat sources (high in fat f s and relatively high susceptibility factors g s , Table 9), there is a noticeable asymmetric effect when the hot batch is near this switch. When batch 20 is the hot batch, batches prior to 20 show greater probability of also being contaminated than do batches after 20. This effect is present to a lesser degree when the hot batch is number 17, 18, or 19. Similarly, when the hot batch is 21, batches after 21 show a greater probability of contamination than those prior, and this effect is also seen to a lesser degree for hot batches 21, 22, and 23. There is also evidence of "long-range" contamination spread. As shown in Tables 10 and 11 and indicated in their captions, Batches 21-45 have a  small (about 2%) probability of contamination when the hot batch is in the range 1-20, and similarly Batches 1-20 have a small probability of contamination when the hot batch is in the range 21-45. This long-range contamination spread is primarily due to the carcass overlap probability between Sources VII and VIII, V 78 = 0.2. As mentioned above, Source VII is input to the first 20 batches and Source VIII to Batches 21-45. Since these sources have a relatively high probability of being contaminated, their overlap causes long-range contamination spreading.

4.
Conclusions. We believe that the proposed model may help to reduce economic losses in the beef industry. To our knowledge, this is a novel method for estimating of the probability of E. coli contamination in the production of ground beef. Most of the input to the model is easily obtained from production records, or easily estimated (such as the average size of pieces). The most difficult input values to estimate are the within-source spread parameters, K, L, and H ± , and the acrosssource overlap parameters V s1s2 . However, some knowledge of these values could be obtained through detailed observations of the plant processes or by using genetic sampling experiments in the raw sources as well as the final ground beef product. The model has been implemented in C and is available on the author's web site http://www.uoguelph.ca/~awillms/grbeefcontam/. A useful next step would be to use an innocuous surrogate for pathogenic E. coli that could be applied to pieces from a particular carcass and then identified in the final product batches. This would allow for better determination of carcass spread within a raw source and would provide data for direct validation of the model.
As a general statement, results from this model indicate that to restrict the spread of E. coli contamination, ground beef production processes should be designed to restrict the spread of pieces from a single carcass within a raw source, and limit the number of batches to which a given raw source provides input.