Embedding cover-free families and cryptographical applications

Cover-free families are set systems used as solutions for a large variety of problems, and in particular, problems where we deal with $n$ elements and want to identify $d$ invalid ones among them by performing only $t$ tests ($t \leq n$). We are specially interested in cryptographic problems, and we note that some of these problems need cover-free families with an increasing size $n$. Solutions that propose the increase of $n$, such as \emph{monotone families} and \emph{nested families}, have been recently considered in the literature. In this paper, we propose a generalization that we call \emph{embedding families}, which allows us to increase both $n$ and $d$. We propose constructions of \emph{embedding families} using polynomials over finite fields, and show specific cases where this construction allows us to prioritize increase of $d$ or $n$ with good compression ratios. We also provide new constructions for monotone families with improved compression ratio. Finally, we show how to use embedded sequences of orthogonal arrays and packing arrays to build embedding families.


Introduction
A cover-free family (CFF) is a set system usually studied in the context of group testing applications. In this scenario, we are given a set of n elements and want to identify up to d invalid ones in a more efficient way than testing each one of them individually. A d-cover-free family (or d-CFF) will indicate how to group the n elements into t groups (t ≤ n), and by performing only t tests we will be able to identify up to d invalid elements. These families are used to solve several problems in cryptography, such as one-time and multiple-times digital signature schemes [22,15], fault-tolerant aggregation of signatures [8,9,21], modification localization on signed documents and redactable signatures [10], broadcast authentication [16], broadcast encryption [7], among others [13].
We can represent d-CFFs as set systems or their corresponding incidence matrices. A set system F = (X, B) consists of a set X = {x 1 , . . . , x t } with |X| = t, and a collection B = {B 1 , . . . , B n } with B i ⊆ X, 1 ≤ i ≤ n, and |B| = n. A d-cover-free family, denoted d−CFF(t, n), is a set system such that for any subset B i0 ∈ B and any other d subsets B i1 , . . . , B i d ∈ B, we have Family F can be represented by its t × n binary incidence matrix M: In the remaining of this paper, we may use the term d-CFFs to refer to their incidence matrices. For a basic reference in combinatorial group testing see [4], and for more information about combinatorial designs see [3]. Table 1 shows how a 2−CFF (9,12) can be used to test n = 12 elements and identify up to d = 2 invalid ones using t = 9 tests. Table 1. Example of a 2-CFF (9,12) used in group testing. The columns of the matrix represent the elements to be tested, and the rows indicate which elements we are testing together. After performing the 9 testes we obtain the last column with some results. If all the elements in a test are valid, the test passes (represented as 0), but if there is at least one defective element in a test, it fails (represented as 1). By the tests that pass we can identify all the valid elements, which in the example are 1, 2, 4, 5, 6, 7, 8, 9, 10, and 11. Since the remaining set of elements S = {3, 12} have |S| ≤ d, by the definition of d-CFF we can conclude 3 and 12 are the defectives (since each of them are the only possible cause for failure in tests test 3 , test 5 , test 7 , test 8 , test 9 ).
CFFs provide a practical solution for problems where the number of elements n is known a priori. For applications where n is not known or can dynamically increase over time, we need a scheme that provides matrix growth. This can be done with a special sequence of d-CFFs, where the previous matrix is a sub-matrix of the next ones, so that we can reuse the groups and computations we already performed for smaller values of n. Monotone families [8] and nested families [9] of d-CFFs are examples of such special sequences that are used to acchieve unbounded fault-tolerant aggregate digital signatures. One drawback of these families is that d must be constant, so we need more general sequences of families if we wish d to grow with n. For this purpose, in this paper, we define a generalization of both monotone and nested families called embedding families.
To compare the efficiency of different families of CFF, we consider the compression ratio, which is given by ρ(n) when n t(n) is O(ρ(n)). The compression ratio measures the efficiency gained from group testing, which performs t(n) tests rather than n. We look for constructions with ρ(n) as large as possible, for example, the ones that meet or are close to the information theoretical bound ρ(n) = n (d 2 / log d) log n [6].
In the literature, monotone families with constant compression ratio have been given in [8], while several constructions of (the more general) nested families with compression ratio closer to the information theoretical bound were presented in [9]. Considering the limitation of constant d for monotone and nested families, the present paper gives constructions of embedding families with good compression ratios that allows d to grow with n.
Our contributions in this paper are as follows. We revisit a construction of d-CFF by Erdös et al. [5] based on polynomials over finite fields (Theorem 3.1) and highlight some useful properties related to progressive d (Theorem 3.2). This property can be observed in the example in Table 2, where a matrix has submatrices with smaller d inside it, which allows us to early abort the testing after enough tests are done for the actual level of defectives. We then give a general construction of embedding families of CFF, each CFF based on this polynomial construction, and stacked together using extension fields (Theorem 3.4). Specific applications of this general construction give embedding families with sublinear d = d(n) and ρ(n) = n 1− 2 k+1 (Corollary 1) as well as with constant d and ρ(n) = n log n , achieving the information theoretical upper bound (Corollary 2). Moreover, we show it is possible to adapt this construction to build monotone families with compression ratio ρ(n) = n 1− 1 k+1 , for each arbitrary constant k ≥ 1 (Theorem 3.5), which is much superior to the constant compression ratios obtained in [8]. Finally, we show that families of orthogonal arrays and packing arrays with some specific properties generalize the polynomial construction of embedding families (Proposition 4), which can open the door for new constructions in the future.
In Section 2, we define embedding families and discuss cryptographical applications. In Section 3, we give constructions of embedding families based on polynomials over finite fields. In Section 4, we discuss the use of these constructions in applications, and challenges related to drop of actual compression ratios when columns are not used. In Section 5, we generalize the polynomial construction by using other combinatorial designs, and in Section 6, we give conclusions.

Embedding Sequences and its Applications
In this section, we present CFF constructions for unbounded applications, which are applications where n may not be known a priori or can grow over time. We introduce the notion of embedding families to be a sequence of CFFs that allows for the increase of n and d, and we also show how they are a generalization of nested families [9] and monotone families [8].
Definition 2.1 (Embedding family). Let d(l) be a positive integer and let (M (l) ) l be a sequence of incidence matrices of cover-free families (F l ) l = (X l , B l ) l , where M (l) is a d(l)-CFF with number of rows and columns denoted by rows(l) and cols(l), respectively. (M (l) ) l is a embedding family of incidence matrices of CFFs, if X l ⊆ X l+1 , rows(l) ≤ rows(l + 1), and cols(l) ≤ cols(l + 1), d(l) ≤ d(l + 1) and We can see that monotone and nested families are a special case of embedding families. They allow us to increase n for fixed d, with a Z that has a special format. The definitions are shown below. Definition 2.2 (Nested family). A nested family of incidence matrices of d-CFFs is an embedding family of incidence matrices with fixed d such that each row of Z is one of the rows of M (l) , a row of all zeros, or a row of all ones.
Nested families were defined in [9] to solve a problem in unbounded aggregation of digital signatures, where three different constructions with increasing compression ratio are presented. Definition 2.3 (Monotone family). A monotone family of incidence matrices of d-CFFs is an embedding family of incidence matrices with fixed d such that Z is a matrix of zeros.
Monotone families were introduced by Hartung et al. [8] to solve the problem of unbounded aggregation of signatures. They showed a concrete instantiation of monotone families with a constant compression ratio. We show in Theorem 3.5 a construction for monotone families with ρ(n) = n 1− 1 c , for a constant c.

Cryptographical Applications.
We can think of a variety of applications for embedding families. General group testing applications, for example, may take advantage of this family for cases where increasing n is necessary, together with the possibility of larger d's. Here we are most interested in applications related to cryptography. Aggregation of signatures: The purpose of aggregation of signatures is to save on storage, communication and verification time by combining several signatures together [1], and d-CFFs are known to provide this while allowing the identification of up to d invalid signatures [8,9,21]. Since the number of signatures may not be known a priori, it is important to have a d-CFF that allows the increase of n. However, after signatures are aggregated together using a smaller matrix, the individual signatures are discarded and we only keep the aggregated ones, which implies that larger matrices should not require the knowledge of those signatures that were discarded. A solution for this was first proposed by Hartung et al. [8] using monotone families, where the zero matrix bellow M (l) address this problem directly. Nested families can also be used as a solution to this problem, since its submatrix Z also address this problem by requiring only one extra aggregation of past signatures [9]. The advantage of nested families is that the known constructions present a much better compression ratio, which is closer to the information theoretical bound, and consequently gives smaller aggregate signature size [9]. Broadcast encryption: In this scheme, a sender broadcasts encrypted messages to a set of n users, but want to prevent some of them from recovering these messages. An example of such application is paid television, where only some users can have access to certain paid channels [7]. Gafni et al. [7] propose the use of d-CFF for distributing the keys that are used to encrypt and decrypt the message. In this scenario, the columns of the d-CFF represent the users, and the rows represent a set of t keys. Each user receives a subset of the keys according to their column, and the d-CFF property guarantees that we can remove up to d users and their respective keys without compromising the ability of the remaining users to decrypt the content. In this scheme, an embedding family would provide a fully scalable scheme [7], where we can add new users by increasing n, and additionally handle a larger number d of users that may be removed.
Broadcast authentication: In this scheme, sender and receivers agree on secret keys, and these keys are used to guarantee the authenticity of broadcasted messages.
However, there may be malicious users who can get together and use their secret keys and previous communication to create fraudulent messages, which may be accepted by some users as authentic [13,16]. Safavi-Naini and Wang [16] propose the use of d-CFFs to manage the distribution of keys. Again, the columns of the CFF represent the users, the rows represent the keys, and each user receives a subset of these keys corresponding to their column of the matrix. Because of the d-CFF, the union of the keys of up to d malicious users is not enough to create a fraudulent message [13,16]. In this scenario, we could again think of embedding families as a way to provide an increase in the number of receivers n and malicious users d that the system can handle.

Embedding Sequences Using Polynomials Over Finite Fields
In this section, we present a construction for embedding sequences based on a known construction of d-CFFs. We start by presenting a construction proposed by Erdös, Frankl and Füredi [5], that uses polynomials of degree up to k over a finite field F q , denoted as f ∈ F q [x] ≤k , and generates a d-CFF(t = q 2 , n = q k+1 ) for d ≤ q−1 k . We also note that this known construction presents some interesting properties that allow us to ignore a few rows of the d-CFF if we need smaller values of d (see Theorem 3.2). We finally show how to use this polynomial construction to obtain embedding families, and how we can focus on prioritizing increases of d, n, and obtain monotone families with increasing compression ratio from it.
Construction 1 (Erdös et al. [5]). Let q be a prime power, k a positive integer, and consider the elements of the finite field as F q = {x 1 , . . . , x q }. We define (X, B) as follows, for each polynomial f ∈ F q [x] ≤k .
The argument in the following proof was observed in [8,15].
We note that the d-CFF construction presented above has a special structure that can be explored to guarantee some interesting properties. Here we focus on Table 2. Example of a 1-CFF (6,9) and a 2-CFF (9,9). allowed by the d-CFF matrix. In the example above, for instance, if we discard the last three rows we obtain a 1-CFF (6,9).
For the remaining of this paper we consider a block of rows in this construction as the set of q rows {(x i , x 0 ), (x i , x 1 ), . . . , (x i , x q )} for every x i ∈ F q . When we restrict our matrix to i blocks of rows, we are considering Construction 2. Let q be a prime power, k ≥ 1, and q ≥ dk + 1. Let C q,k,d be the matrix corresponding to B(dk + 1), or in other words, the matrix C q,k from Construction 1 restricted to the first (dk + 1) blocks of rows.
Proof. The proof follows a similar argument as for Theorem 3.1. We have |B fi (a) ∩ B fj (a)| ≤ k for all f i , f j ∈ F q [x] ≤k and 1 ≤ a ≤ q. Taking any d + 1 distinct sets has the d-cover-free property and C q,k,d is a d-CFF((dk + 1)q, q k+1 ).
For the case of k = 1, we observe that this incremental d property was given in [14]. This is because the constructions presented in [14] are based on Mutually Orthogonal Latin Squares (MOLS), which can be constructed with polynomials of degree k = 1.

Embedding Sequence Construction.
In this section, we give constructions of embedding sequences of CFFs using the previous construction and extension fields. We start with a prime power q and consider the increase as q 2 i for i ≥ 0, which gives a direct increase of n and t. Since we are increasing q, we may also consider to increase k and/or d as long as we respect the inequality q ≥ dk + 1. By increasing k to some k we make n grow faster and consequently improve the compression ratio. By increasing d to d we can allow the identification of more defective elements, which may be necessary as the number of elements n grows.
The following theorem is the basic step to be used in the embedding sequence construction. Theorem 3.3. Let q ≥ dk + 1, k ≥ k, d ≥ d and q 2 ≥ d k + 1. Let C q,k,d and C q 2 ,k ,d be the CFF matrices obtained from the polynomial construction (Construction 2). Then, there exists C q 2 ,k ,d obtained from C q 2 ,k ,d by a column and row permutation that has the form Moreover, C q 2 ,k ,d is a d -CFF((d k + 1)q 2 , q 2(k +1) ) and C q,k,d is a d-CFF((dk + 1)q, q (k+1) ).
Proof. To form C q 2 ,k ,d we first list the rows of C q 2 ,k ,d that are of the form (x i , x j ) for all 1 ≤ i ≤ dk + 1, 1 ≤ j ≤ q and its columns indexed by B f for all f ∈ F q [x] ≤k , followed by the remaining rows and columns in some order. Since F q is a subfield

so we can list the columns starting by all
≤k , and the evaluations of polynomials in F q and F q 2 give the same result. Thus, the (dk + 1)q × q k+1 submatrix of C q 2 ,k ,d in the upper left corner coincides precisely with C q,k,d . The fact they are d-CFF and d -CFF comes from Theorem 3.2.
Then, the sequence {C q 2 i ,ki,di } i≥0 is an embedding family of CFFs.
Corollary 1 (Prioritizing d increase). Let d 0 ≥ 1, k ≥ 1, and let q be a prime is an embedding family of CFFs. Moreover, its compression ratio is ρ(n) = n 1− 2 k+1 and d ∼ n 1/k+1 k . Proof. We have that d i = q 2 i k − 1 < q 2 i k and therefore d i k < q 2 i , which satisfies the hypothesis of Theorem 3.4. Finally, for fixed k and assuming the use of all rows of the matrix, we easily calculate the compression ratio n t = (q 2 i ) k+1 (q 2 i ) 2 = n n 2/k+1 = n 1− 2 k+1 , which is increasing when k ≥ 2.
We show a few examples in Table 4 and Table 5 for q = 4, 16, 256, 65536 and for fixed values of k. For each q and k we compute d = q k − 1 and n = q k+1 . We note that as k increases, the maximum value of d decreases but we get constructions with a better ratio.
Corollary 2 (Prioritizing ratio increase). Let d ≥ 1 and k 0 ≥ 1, q a prime power is an embedding family of CFFs. Moreover, the compression ratio is ρ(n) = n log n .
Proof. We have that k i = q 2 i d −1 < q 2 i d , and therefore k i d < q 2 i , which satisfies the hypothesis of Theorem 3.4. Finally, for fixed d and assuming the use of all rows of the matrix, we easily obtain the compression ratio n t = (q 2 i Table 6 and Table 7 for q = 4, 16, 256, 65536, fixed values of d, and increasing k. For each q and d we compute k = q d −1 and n = q k+1 . We note that the ratio grows very quickly as k increases to its maximum.

We show some examples in
Monotone families are desirable for some applications due to their flexibility since the new tests involve only new items. By selecting specific blocks of rows for the embedding family, we are able to achieve monotone families with increasing compression ratio, which was not known in the literature [8,9].
Theorem 3.5. Let d ≥ 1 and k ≥ 1, q a prime power such that q ≥ dk + 1. Let C q,k,d be a d-CFF obtained from Construction 1 and {C q 2 i ,k,d } i≥0 be an embedding family of d-CFFs for fixed k and d, obtained from recursively applying Theorem 3.3, where C q,k,d = C q,k,d and C q 2 i ,k,d be the reordered matrix as shown in Theorem 3.3. Consider M q 2 i ,k,d the submatrix of C q 2 i ,k,d corresponding to rows indexed by (x l , x j ) where x l ∈ F q , l = 1, . . . , dk + 1; is a monotone family of d-CFF(t = (dk + 1)q 2 i , n = q 2 i (k+1) ). Moreover, the compression ratio is ρ(n) = n 1− 1 k+1 .
Proof. By fixing x l ∈ F q , l = 1, . . . , dk + 1 we obtain a matrix with dk + 1 blocks of rows and consequently |B f | = dk + 1, and by the same argument as in Theorem 3.2 we know that each matrix M q 2 i ,k,d is a d-CFF with d ≤ q−1 k . Moreover, if we look to the columns of M q 2 i ,k,d that are represented by polynomials f ∈ F q 2 i−1 [x] ≤k , we know f (x l ) = x j ∈ F q 2 i−1 , and consequently M (x l ,xj ),f = 0 for all the cases where It is easy to see that this matches the definition of monotone family of d-CFFs. Finally, if we use the maximum d = q−1 k we obtain a sequence of d-CFF(t = q × q 2 i , n = (q 2 i ) k+1 ), which has compression ratio n t = (q 2 i ) k+1 For an example of Theorem 3.5 with q = 3, d = 2, k = 1, we refer to Table 3 to obtain the first two matrices in the sequence. Indeed, M 3,1,2 is the top left submatrix, and M 9,1,2 is the given matrix restricted to the first dk+1 = 3 blocks of rows (first two groups of rows in Table 3), and the columns corresponding to polynomials of degree up to k = 1. The ratio of this monotone family is ρ(n) = √ n/3.

Using Embedding Sequences in Applications
The use of embedding families given in Theorem 3.4 requires some caution. While compression ratios are excellent when each full matrix is used, as seen in Corollary 1, Corollary 2, and Theorem 3.5, bad ratios can be found when we need to add much less items than the maximum n for a matrix. Note that when the number of columns of M (l) is exceeded we need to use M (l+1) and remove unused columns. For example, with q = 4, d = 1, k = 2 we get a maximum n = 64, t = (dk +1)q = 12 and ratio ρ(n) = 5.33. If we decide to use the extension field to get larger values of n, the next value will be q = 16 which gives t = (dk + 1)q = 48. This new matrix can handle up to n = 4096, but for the case were we only need n = 65 we will have a very small ratio of ρ(n) = 1.35. For this reason it would be desirable to develop techniques for "smoothing out" the transition in compression ratio when we move from one matrix to the next in the embedding family.
One strategy to reduce these sharp transitions is to use values much smaller than the maximum allowed by the construction. In Figure 1 and Table 8 we show a choice of q = 16, 256, d = log 4 n, and necessary increases of k = 1, 2, 3 to achieve the desired values of n. We note that as we change from one field (q = 16) to the next one (q = 256) we have a drop on the compression ratio, and this is due to the increase in the number of rows (dk + 1)q as we increase q. As n grows, the increasing ratio is restored.

Generalized Construction of Embedding Families
In this section, we present a generalization of the results presented in Section 3. We start by defining a few objects that will be used, such as Orthogonal Arrays (OAs), Packing Arrays (PAs), and their relationships with separating hash families (SHF). Then we discuss how they can be used to construct embedding families.
Definition 5.1. An orthogonal array OA(v t ; t, k, v) is an v t ×k array with elements from an alphabet of v symbols, such that in any t columns, every t-tuple of points is contained in exactly one row. The next figure shows an example of an SHF(2; 6, 4, {1, 2}), based on a construction given by Li, Van Rees and Wei [12]. Now we present some relationships between packing arrays and separating hash families, and how we can use them to construct CFFs and embedding families. In the following propositions, we consider a PA(n; t, k, v) and the fact that any two rows have at most t − 1 positions in common. A similar result is presented by Stinson et al. [18], where they propose the use of orthogonal arrays to construct perfect hash families, and mention that similar results can be achieved for SHFs. Proposition 1. If A is a PA(n; t, k, v), and w ≤ k−1 t−1 , then A T is a SHF(N = k; n, m = v, {1, w}).
We claim there exists a column j ∈ T = {1, . . . , k}\ w i=1 B i , which is the desired column in (2). In fact, so column j exists, which becomes row j when using A T , and therefore guarantees the necessary and sufficient property for a SHF of type {1, w}.
Now we show we can construct d-CFF incidence matrices from SHFs. Similar results can be found in [11] from perfect hash families, and a more general result can be found in [19] for the construction of (w, d)-CFFs from SHFs of type {w, d}.  Proof. If we take w + 1 columns c 0 , c 1 , . . . , c w of a SHF A of type {1, w}, we know there will be a row i such that A[i, c 0 ] = A[i, c j ] for j ∈ {1, . . . , w}. In Construction 3 we convert each element x in A into a vector of size m with one "1" in position x and "0" in the remaining positions. Due to the SHF property, we note that there is at least one row (i, x) such that M (i,x),c0 = 1 while M (i,x),cj = 0, (1 ≤ j ≤ w), for any columns c 0 , c 1 , . . . , c w in M, which matches the requirements for a w-CFF. Moreover, since we expand each row of A into an array of size m, M is w-CFF(mN, n).
Remark 2. If we use a PA(n; t, k, v) to build a SHF(N = k; n, m = v, {1, w}) with w ≤ k−1 t−1 as in Proposition 1 and then apply Construction 3, we obtain a d-CFF(mN, n) with d ≤ k−1 t−1 .

Remark 3.
For the special case where we use an OA(q t ; t, q, q) constructed using polynomials over finite fields using Bush's construction [2], q a prime power, to build a SHF(q; q t , q, {1, w}) and then apply Construction 3, we have a d-CFF(q 2 , q t ) for d ≤ q−1 t−1 , which is equivalent to Construction 1 using polynomials. When we construct a d-CFF from a SHF that came from a PA, we observe a similar property of blocks of rows giving increasing values of d as in Theorem 3.2.
Proposition 3. Let P be a PA(n; t, k, v), A = P T be a SHF(N = k; n, m = v, {1, w}) with w ≤ k−1 t−1 , and k i = i(t − 1) + 1. Consider M the w-CFF(k × v, n) obtained from Construction 3 using A and let a "block" of rows be any consecutive m rows indexed by (i, 1), . . . , (i, m). When we restrict M to any k i blocks of rows we obtain a i-CFF(k i × v, n).
Proof. From Proposition 1 we know that a SHF of type {1, w} can be created from a PA P of strength t as long as the number of columns of the PA is at least k ≥ w(t − 1) + 1. We can restrict the packing array P to k i = i(t − 1) + 1 columns, 1 ≤ i ≤ w, without compromising its properties and therefore obtain a SHF A i of type {1, i}. By applying Construction 3 with A i we obtain a i-CFF(k i × v, n).
The next proposition shows that a special sequence of PAs generalizes the polynomial construction of embedding family given in Theorem 3.4. Proposition 4. Let (P (l) ) l be a sequence of PAs, where P (l) is a PA(n l ; t l , k l , v l ), n l ≤ n l+1 , t l ≤ t l+1 , k l ≤ k l+1 , v l ≤ v l+1 , and P (l+1) = P (l) Y Z W , and in addition d l ≤ k l −1 t l −1 , d l ≤ d l+1 , for all l ≥ 1. Then there exists an embedding family of d l -CFF(k l × v l , n l ).
Proof. Let A (l) be obtained by transposing P (l) . By Proposition 1, since d l ≤ k l −1 t l −1 , A (l) is a SHF(N l = k l ; n l , m l = v l , {1, d l }), and consequently (A (l) ) l is a sequence of SHFs with the following form Then, for each A (l) we apply Construction 3 and we get M (l) , a d l -CFF(k l × v l , n l ) by Proposition 2. It is easy to see that the smaller CFF M (l) is in the top corner of M (l+1) , and all the requirements for being an embedding family are satisfied.

Conclusion
This paper introduces the idea of embedding families of CFFs as a general framework to look at how CFF constructions can be leveraged to optimize parameters of interest to applications. The infinite families obtained in Section 3 have excellent asymptotic compression ratios, some matching the information theoretical upper bound, and permit increase of d and n. However, these constructions present abrupt increases of t and n (when moving to the next q) that need to be "smoothed out" for improved use in applications, as discussed in Section 4. An important direction for future work is the study of adequate growth for d and k to yield smoother instances of these families.