A comparison of vakonomic and nonholonomic dynamics with applications to non-invariant Chaplygin systems

We study relations between vakonomically and nonholonomically constrained Lagrangian dynamics for the same set of linear constraints. The basic idea is to compare both situations at the level of variational principles, not equations of motion as has been done so far. The method seems to be quite powerful and effective. In particular, it allows to derive, interpret and generalize many known results on non-Abelian Chaplygin systems. We apply it also to a class of systems on Lie groups with a left-invariant constraints distribution. Concrete examples of the unicycle in a potential field, the two-wheeled carriage and the generalized Heisenberg system are discussed.


Introduction
State of research. The problem of obtaining the equations of motion of a mechanical system in the presence of constraints has a long history and has gained attention of many famous mathematicians (see e.g., [19] for a brief historical discussion, compare also [1]). In general, constraints are introduced by specifying a submanifold (or simply a subset) C ⊂ TQ of the tangent bundle of the configuration manifold Q. Typically C is assumed to be a linear (non-integrable) distribution (we speak of the linear case in such a situation). In principle there are two non-equivalent ways of generating the dynamics from the constraints C. They are known as nonholonomic and vakonomic (i.e., variational of axiomatic kind [1]) methods. Nonholonomic dynamics are believed to be the ones describing the real physical movement of the constrained system [20]. They are obtained by means of Chetaev's principle (in the linear case we speak rather about d'Alembert's principle or the principle of virtual work). On the other hand, vakonomic dynamics are related to optimal control theory and are derived as a solution of a constrained variational problem.
The word nonholonomic is often used in two contexts. As an adjective describing the method of generating the dynamics by means of Chetaev's (d'Alembert's) principle and as a substitute of the word nonintegrable in the description of the constraints distribution C. Because of the latter, vakonomic dynamics are sometimes called variational nonholonomic [4,8]. To avoid possible confusions, in this paper we will reserve the name nonholonomic for its meaning related with Chetaev's principle. Let us remark that by the dynamics we will understand the set of all trajectories of the considered system.
The problem of comparison of the two (i.e., vakonomic and nonholonomic) non-equivalent ways of generating the constrained dynamics has long been addressed by the scientific community (see Sec. 3 in [19] and the references therein). This non-equivalence can be easily observed at the level of the equations of motion -in many cases vakonomic dynamics are much richer then the nonholonomic ones. Therefore, it is natural to address the following question: (Q1) Are the nonholonomic dynamics a subset of the vakonomic ones for a given constrained system?
It is also interesting to formulate this problem for a particular trajectory: (Q2) Is a given nonholonomic trajectory a vakonomic one?
Some authors ask also the inverse of the latter: (Q3) Is a given vakonomic trajectory a nonholonomic one?
In this paper we will be concerned with giving answers to these questions. We will refer to them as to the problems of comparison. Let us remark that in general it is easier to find a set of necessary and sufficient conditions for a given trajectory of a system to answer (Q2) or (Q3) then to find such conditions for the whole system to answer (Q1) (yet a positive answer to (Q2) on every nonholonomic trajectory provides an obvious sufficient condition). The reason is basically that restricting attention to a single trajectory helps to avoid certain global problems, such as the existence of solutions, etc.
Although it was observed already at the end of 19. century that the nonholonomic and vakonomic methods lead, in general, to different trajectories, the problem of their comparison was first stated in [20] and [4] (the former being inspired by an example of a unicycle moving on the plane discussed in [2]). In the terminology of [8] nonholonomic systems answering positively question (Q1) are called conditionally variational, whereas systems possessing only some nonholonomic trajectories that answer positively question (Q2) -partially conditionally variational. Crampin and Mestdag [6] used terms weak and strong consistency in a similar, but slightly different context. Some authors ( [5,7,8]), in our opinion incorrectly, spoke about equivalence of nonholonomic and vakonomic dynamics for systems positively answering (Q1). At the level of dynamics there can be at most inclusion, not equivalence.
In general the sets of nonholonomic and vakonomic trajectories are not related by the inclusion of (Q1) -a sphere rolling on a rotating table provides a natural example of a system where a generic nonholonomic trajectory cannot be a vakonomic one (see [7,20]). According to [8], Rumianstev [22] was the first to answer question (Q2). His answer, however, requires an explicit knowledge of the Lagrange multipliers of the vakonomic trajectories, hence in fact the solutions of vakonomically constrained problem. So far all approaches to the comparison problems (Q1)-(Q3) used the method of comparing the nonholonomic and vakonomic equations of motion. Below we present the most important contributions to this field. The first three of them concern Chaplygin systems i.e., Lagrangian systems defined on a principal G-bundle with the linear constraints given by a horizontal distribution of a G-principal connection. The last one is more general, yet the answer is presented in the form of an algorithm of a priori unknown result (concrete criteria are derived also only the Chaplygin case). Let us note that most of these results require certain regularity assumptions about the Lagrangian. These are needed, in principle, to present implicit equations of motion in an explicit form.
He constructs explicitly the vakonomic multiplier, and by comparing nonholonomic and vakonomic equations of motion gives an answer in terms of the curvature of the constraints distribution. His assumptions are the G-invariance and certain regularity of the Lagrangian (the latter sufficient for the existence of a momentum map and for the nonholonomic multiplier to be given by a time-dependent section over the configuration space). For the constrained geodesic problem Favretti gave, in Thm. 3.2, sufficient condition for the positive answer to (Q1). His assumptions are, however, very strong: the constraints distribution is totally geodesic and, additionally, the perpendicular distribution is integrable. As a particular example he showed that a two-wheeled carriage is a constrained mechanical system for which every nonholonomic trajectory is a vakonomic one.
• Fernandez and Bloch in [8] where able to explicitly find vakonomic multipliers for a simple (yet relatively wide) class of abelian Chaplygin systems (and under additional assumption that the Lagrangian is of mechanical type and regular). In consequence, they where able to apply Rumianstev's method to these systems. The resulting answer to (Q2) was given in terms of the geometry of the constraints distribution and vertical derivatives of the Lagrangian. For a more general class of nonabelian Chaplygin systems their approach gave a partial answer to (Q2). Fernandez and Bloch show, in particular, that examples of a unicycle and a two-wheeled carriage answer (Q1) positively. The authors, however, claim incorrectly that examples of the Heisenberg system, the Chaplygin skate and an invariant system on SO (3) give a negative answer to (Q1). We comment and correct their mistake in Remark 6.4. Let us remark that Fernandez and Bloch study also an interesting question about the relation of problem (Q1) and the existence of an invariant measure.
• The paper of Crampin and Mestdag [6] attacks the problem of comparison form a slightly different angle. Their main idea is to express the constrained dynamics by means of certain vector fields on TQ.
The comparison problem can be now solved by comparing these fields. To do so they use an ingenious technical tool of anholonomic frames (i.e., local frames adapted to the constraints distribution), which considerably simplifies the calculations. As a result they were able to extend the results of [8] to non-abelian Chaplygin systems (actually regaining some results from [7]). Crampin and Mestdag work under technical assumptions about the regularity of the Lagrangian (required if we want the Lagrangian dynamic to be locally a flow of a vector field) and restrict themselves to specific (yet quite general) classes of vakonomic multipliers (defined by a section over the configuration manifold). Due to these restrictions the relation of their results with original problem (Q1) is not obvious. Clearly if the multipliers can be determined (as turns out to be in the case of Chaplygin systems) we answer (Q1). Nevertheless, they provide criteria for answering (Q2) and (Q3) on particular trajectories.
• Cortes, de Leon, Martin de Diego and Martinez [5] formulated both vakonomic and nonholonomic mechanics is a presymplectic framework similar to Skinner-Rusk formalism. To compare both dynamics one has to apply a constraint algorithm and compare the resulting final constraint submanifolds. Using this method the authors re-obtained Theorems 3.1 and 3.2 of Favretti [7] under weaker assumptions. They also studied a few well-known examples including a unicycle.
We discuss the relation of the above results with our work in Remark 4.14.
Crucial ideas. Our basic ideas are rooted in the conceptual works of Tulczyjew [24] on statics of physical systems. According to him equilibria of such a system are determined by a variational principle which involves possible configurations of a system, (infinitesimal) processes (movements) that the system can be subject to, and its reactions to such movements (i.e., work that must be performed in order to change the configuration). Paper [24] deals mainly with statics, yet this limitation should be understood as a simple particular realization of a very general philosophy. For example, to treat Lagrangian mechanics we should translate configurations to admissible trajectories, infinitesimal movements to admissible variations and reactions to the change of the action functional along these variations. The philosophy of Tulczyjew gives also a new insight into the idea of constraints: these are simply restrictions in the sets of configurations and/or infinitesimal movements of the system. It also changes the perspective of looking at the equations of motion: we should understand them not as constituting the system, but merely as reflections of the underlying variational principle, which is the basis of every study. Actually this point of view is not new, just "out of fashion" at present. It can be tracked back in time as far as to Lagrange himself (see the first comment in Koiller's paper [17]). Describing nonholonomic and vakonomic Lagrangian dynamics in the spirit of Tulczyjew's variational principles is elementary. Given a constraints submanifold C ⊂ TQ we define admissible trajectories for both situations as these paths γ(t) ∈ C which are the tangent lifts of the true base paths. The reactions are again common for both situations and given by the changes of the action functional (defined by means of the Lagrangian). The only difference appears at the level of admissible variations: in the nonholonomic case they are described by Chetaev's principle (for linear constraints they are performed in the directions of C), whereas in the vakonomic case we consider these variations that respect the constraints. In this way we are able to present both dynamics within the common framework of Tulczyjew's variational principles. This observation should be attributed Gracia, Martin and Munos [13]. Similar remarks have been made before (see e.g., [4]), yet the authors of [13] were, in our opinion, the first to fully understand the common nature of nonholonomic and vakonomic dynamics. In this context one should also mention later works [10,14,19].
The main thought of this paper comes directly form [24] and [13]. Namely, we compare nonholonomic and vakonomic dynamics at the level of the corresponding variational principles (in fact it is enough to concentrate on admissible variations) not equations of motion, as is usually done. In this way we get to the point where the actual differences between these two dynamics come from. Differences in the equations of motions are just an emanation of these basic differences. And, after all, questions (Q1)-(Q3) are not about the equality of equations, but the equality of trajectories. From the technical side we must admit a strong inspiration form [6]. The idea of adapting the frames to the constrained distribution allowed us to treat Chaplygin systems easily.
Our results. In Section 3 we present the philosophy of Tulczyjew's [24] variational principles applied to Lagrangian dynamics. We introduce restricted variational principles which correspond to constrained systems and discuss the particular cases of nonholonomically and vakonomically constrained Lagrangian dynamics, proceeding in accordance with [13]. Concerning the problems of comparison we prove abstract Proposition 3.7, which states that restricting the variational principle results in extending the set of the extremals. A slight generalization in Proposition 3.9 allows us to incorporate symmetries of the Lagrangian into the game: we can compare the extremals of two different variational principles, provided that we can compare the admissible variations up to the symmetries of the Lagrangian. This possibly gives a new insight into an interesting problem to study relations between the symmetries of the system and constraints (see [19], Sec. 4.4 and the references therein).
The results of Section 3 have a very formal and abstract character, and may seem to introduce superfluous formalism or just vainly reformulate the things that are well-known. To show that it is not so, we present, in Section 4, their application to the problems of comparison (Q2) and (Q3) for the broad class of systems with linear constraints, namely to non-invariant Chaplygin systems (i.e., Chaplygin systems without the G-invariance assumption). Our biggest gain is simplicity, as admissible variations are much easier objects to play with then the equations of motions, the latter being derived form the former. Therefore the proof of our main result -Theorem 4.8 -is straightforward. This result fully characterizes (in terms of the geometry of the (non-invariant) Chaplygin system): (a) these nonholonomic extremals which are simultaneously the unconstrained extremals; (b) these nonholonomic extremals which are simultaneously the vakonomic extremals (i.e., provides an answer to (Q2)); (c) these vakonomic extremals (associated with a given Lagrange multiplier) which are simultaneously nonholonomic extremals (thus provides an answer to (Q3)).
The known results on Chaplygin systems from [6,7,8] follow easily as corollaries as pointed in Remark 4.14 and Corollary 4.13. Let us mention that we do not just regain these results, but also substantially generalize them, as in our approach any regularity conditions are superfluous and the role of the symmetry conditions becomes apparent. Actually, it turns out that the symmetry is not as important as the existence of the natural splitting of the configurations space into horizontal and vertical parts.
In the remaining part of Section 4 we derive the precise form of the vakonomic multiplier (Proposition 4.9), discuss the relation between questions (Q2) and (Q3) (Lemma 4.10) and study (non-invariant) Chaplygin systems subject to additional symmetry conditions (Corollaries 4.11-4.13).
Section 5 presents an application of our general methods from Section 3 to a particular class of systems on Lie groups with linear constraints defined by a left invariant distribution. Such systems, with an additional assumption of the symmetry of the Lagrangian, were considered for instance in [17]. In this case we prove Theorem 5.3 which answers the same questions as Theorem 4.8 for the considered class of systems (in fact both theorems are closely related as is explained in Remarks 5.7 and 5.8). In this case we can also derive the precise form of the vakonomic multiplier and, moreover, write explicitly the nonholonomic equations of motion (Lemma 5.4). In Corollary 5.6 we discuss the special case of a system with an additional symmetry.
In Section 6 we study concrete examples of nonholonomic systems with linear constraints. These include the unicycle (Example 6.1), the two-wheeled carriage (Example 6.2) and the Heisenberg system (Example 6.3 and its generalization in Example 6.6). All these situations position themselves in the common setting of Sections 4 and 5 described in Remarks 5.7 and 5.8. Therefore we use them to illustrate the results from both Section 4 and 5. For all the considered situations, which were widely studied (for instance in [2,4,5,6,7,8,14]), our methods provide an elegant answer to question (Q1) (in most cases already known in the literature). Of particular interest is Example 6.2 where, from purely geometric (Lie algebraic) and relatively simple consideration, we were able to re-obtain an interesting result form [6]: the two-wheeled carriage with a shifted center of mass answers positively (Q1) if and only if the parameters of the system satisfy a certain algebraic condition. In [6] this case is described by a vakonomic multiplier equal to the momentum shifted by a constant of motion.
In addition to our main question (Q1) in all the considered examples we were also able to determine these nonholonomic trajectories which are simultaneously extremals of the unconstrained dynamics. We also derived the general form of the vakonomic multiplier.
Finally let us note that our Examples 6.3 and 6.6 contradict Proposition 3(5) in [8], which state that a system on a 3-dimensional manifold with a 2-dimensional non-integrable constraints distribution cannot answer question (Q1) positively. We explain this mistake of Fernandez and Bloch in Remark 6.4.

Preliminaries
Throughout the paper we work in the C ∞ -smooth category. By Q we will denote a n-dimensional smooth manifold, by τ Q : TQ → Q its tangent bundle, and by π Q : T * Q → Q the cotangent bundle. We will use symbol X (Q) to denote the C ∞ (Q)-module of vector fields on Q. When working with local coordinates the summation convention of Einstein will always be assumed.
Local coordinates. Let us introduce a local coordinate system (q a ), a = 1, 2, . . . , n on a manifold Q. Such a system induces a coordinate system (q a ,q b ), a, b = 1, . . . , n on TQ, i.e.,q b 's are the coordinates of a vector in TQ with respect to the local basis {∂ q a } or, equivalently,q b := dq b , · : TQ → R.
In our work we shall, however, use also another coordinate system (q a , v b ) on TQ associated with a given local basis {e a } of TQ. In other words v b := θ b , · : TQ → R, where {θ b } is the local coframe dual to {e a }, i.e., θ b , e a = δ b a . Such coordinates were considered in the context of nonholonomic constrains by Crampin and Mestdag [6]. These authors call the basis {e a } an anholonomic frame. To describe the passage between the two coordinate systems (q a ,q b ) and (q a , v b ), introduce a family of transition matrices A a b (q) relating the two local basis The above formula is useful when describing the tangent lift of a curve in Q. Let, namely, q : [t 0 , t 1 ] → Q be a smooth curve described locally by q(t) ∼ (q a (t)). Its tangent lift, which we will denote by tq(t) or t t q(t), is a curve in TQ described locally by in coordinates (q a , v b ), whereq a and v b are related by (2.1), i.e., d dt q a (t) = A a b (q(t))v b (t). Let now C a bc (q) be the coefficients of the Lie bracket on X (Q) with respect to the basis {e a }, that is, Using formula (2.1) and the fact that [∂ q b , ∂ q c ] = 0 one can easily deduce the following relation between C a bc (a) and We leave the justification of this formula as an exercise.
The canonical flip. It is well known (see e.g., [9]) that the iterated tangent bundle TTQ admits an involutive map called the canonical flip which is defined as κ Q : t s t t q(t, s) −→ t t t s q(t, s).
where q(t, s) ⊂ Q is any representative of an element t s t t q(t, s) ∈ TTQ.
To derive the local form of ε Q introduce first natural coordinates (q a , z b ) on T * Q dual to the coordinates (q a , v b ) on TQ, i.e., z b := ·, e b : T * Q → R. We have induced coordinates (q a , z b ,q c ,ż d ) on TT * Q defined byq c := dq c , · : TT * Q → R andż d := dz d , · : TT * Q → R. Since the canonical pairing ·, · τ Q locally reads as its tangent lift ·, · Tτ Q in coordinates reads as On T * TQ we can introduce coordinates (q a , v b , p c , w d ) induced from (q a , v b ) on TQ, i.e., p c := ·, ∂ q c : T * TQ → R and w d := ·, ∂ v d : T * TQ → R. Now the canonical pairing ·, · τ TQ reads simply Expressing (2.4) with help of (2.2), (2.5) and (2.6) leads to the following local formula Let us end this part by discussing some properties of the pairing ·, · Tτ Q . Clearly, TT * Q possesses the structure of a double vector bundle (consult, e.g., [9] and the references therein), with the following two compatible vector bundle structures τ T * Q : TT * Q → T * Q and Tπ Q : TT * Q → TQ. Similarly, TTQ is a double vector bundle with two compatible vector bundle structures τ TQ : TTQ → TQ and Tτ Q : TTQ → TQ. The pairing ·, · Tτ Q : TT * Q× TQ TTQ → R is bilinear (i.e., linear in both arguments) with respect to the vector bundle structures Tπ Q and Tτ Q as the tangent lift of ·, · τ Q : T * Q × Q TQ → R which was already bilinear with respect to vector bundle structures π Q and τ Q . This bilinearity can be checked directly by looking at the coordinate formula (2.5).
Observe also that for z b = 0, formula (2.5) reduces to Geometrically it means that given a vector Φ ∈ T 0 T * Q and a vector V ∈ TTQ such that Tπ Q (Φ) = Tτ Q (V ), we have

An abstract approach to constrained Lagrangian dynamics
In this section, we look at the (constrained) Lagrangian dynamics from a slightly more formal and more abstract point of view than it is usually done in literature. Such an approach will allow us to treat different constrained variational problems in a unified way. Similar ideas have already been presented (see for example [10,13,24]).
Variational principles. The standard variational problem on TQ is constituted by a function L : TQ → R called a Lagrangian. Given any smooth path γ : I = [t 0 , t 1 ] → TQ we can define the action along γ by the formula S L (γ) := I L(γ(t))dt.
In the standard problem we consider S L only for paths γ : I → TQ which are tangent prolongations of curves in Q, i.e.,γ(t) = t t q(t), where q(t) = τ Q • γ(t) is the base projection of γ.
A variation along a path γ is simply a vector field along γ, i.e, a curve δγ : I → TTQ which projects to γ under τ TQ . Among all variations of a certain γ we can distinguish the class of variations with vanishing end-points, i.e., δγ such that Tτ (δγ(t)) ∈ TQ vanishes at t 0 and t 1 .
In the standard variational problem we consider variations generated by homotopies. Let namely q(t, s) ∈ M be a one-parameter family of base paths. This homotopy defines a natural variation δγ(t) = t s s=0 t t q(t, s) along the path γ(t) = t t q(t, 0). Geometrically such δγ is generated by a curve ξ(t) := t s s=0 q(t, s) ∈ T q(t,0) Q with the help of the canonical flip κ Q : TTQ → TTQ: To emphasize the role of the generator ξ(t) (called also sometimes an infinitesimal variation) we will denote δγ =: δ ξ γ.
Assume now that, in local coordinates (q a , v b ) on TQ, the path γ(t) = t t q(t) corresponds to a curve (q a (t), v b (t)) and the generator ξ(t) to (q a (t), w b (t)). Due to formula (2.2), the variation of γ(t) generated by ξ(t) reads locally, in coordinates (q a , v b ,q c ,v d ), as We will need the following three facts about admissible variations which follow directly form the above coordinate description.
Proposition 3.1. The admissible variation δ ξ γ (i) is linear with respect to ξ, i.e., where the addition is performed with respect to the vector bundle structure on τ TQ : TTQ → TQ.
The standard variational problem is the search of all paths γ from the considered class such that for every variation δγ with vanishing end-points (as considered above) the associated variation of the action S L at γ along δγ dS L (γ), δγ := I dL(γ(t)), δγ(t) dt vanishes.
Remark 3.2. The usage of the symbol dS L (γ) can be made rigorous in the framework of analysis on Banach manifolds. The action S L can be understood as a function on the manifold of paths γ of a certain class, whereas δγ's are elements of the tangent space to that manifold. The interested reader may consult [21].
Motivated by the standard situation we propose the following general definition in the spirit of [24]. An admissible path γ ∈ T is called an extremal (or a trajectory) of the variational principle P if and only if dS L (γ), δγ = 0 for every δγ ∈ W 0 γ , i.e., γ is a critical point of the action S L relative to admissible variations with vanishing end-points. The set of all extremals of P will be denoted by Γ P ⊂ T . Notice that searching for a critical point of S L relative to paths γ such that δγ ∈ W 0 γ does not necessarily correspond to minimization (maximization) of S L .

Remark 3.4.
Despite the fact that in the above definition of the extremal we used admissible variations with vanishing end-points only, the role of the end-points should not be underestimated. In fact, the full variational principle should describe the reaction of the system to an arbitrary admissible variation, i.e., it should contain not only the information abut the extremal, but also the boundary terms which describe the initial and final momenta of the system (cf. Sec. 15 in [24]). The need of including the non-vanishing endpoints becomes apparent also in some natural situations in variational calculus and control theory, where more general boundary conditions are needed.
Clearly, the extremals of the standard variational problem on TQ (i.e., solutions of the associated Euler-Lagrange equations) are extremals of the following variational principle P st We shall refer to the elements of W st TQ as to standard admissible variations.
Constraints. The above Definition 3.3 may seem artificial but it turns out to be particularly useful in the context of constrains. We say that if it is obtained form P by shrinking the set of admissible trajectories and/or admissible variations, i.e., T ′ ⊆ T and W ′ ⊆ W. One should think that the principle P describes an unconstrained system and P ′ is the same systems with imputed constraints. Usually these restrictions are somehow related to additional geometric structures on the bundle TQ. Two important examples of such a situation are vakonomic and nonholonomic variational principles associated with a submanifold C ⊂ TQ, being the two restrictions of the standard variational principle P st TQ . Shortly we shall show that the extremals of these restricted variational principles constitute the vakonomically and nonholonomically constrained Lagrangian dynamics in the standard sense.
We define a vakonomic variational principle P vak C = (L, T vak C , W vak C ) associated with C ⊂ TQ by restricting our attention to admissible paths in C and admissible variations tangent to C. That is Observe that the above notion of admissible variations W vak C coincides with the notion of vakonomic variations present in literature (see e.g., [1,4,19]). Clearly any homotopy γ(t, s) ⊂ C produces a variation in W vak C . Conversely, every variation in W vak C can be obtained from a homotopy γ(t, s) which lies in C up to o(s)-terms. Such relaxation of the condition γ(t, s) ⊂ C, helps to exclude the problems of singular trajectories and abnormal extremals (see Ssec. 1.4 in [1]). In light of this observation it is clear that the extremals of P vak C correspond to the extremal points of S L on the set of admissible paths T vak C , i.e., to trajectories of the vakonomically constrained system. Thus we have proved Proposition 3.5. The set of extremals of the vakonomic variational principle P vak C is precisely the constrained vakonomic dynamics associated with C.

Observe that although the set of vakonomic admissible variations
, in general, it is difficult to specify the generators ξ(t) for which a given admissible variation δ ξ(t) γ(t) of the standard variational principle belongs to TC. Note also that since the vakonomic variations are tangent to C, the vakonomic dynamics are determined by the restriction of L to C. Throughout the paper we will use the abbreviated symbol Γ vak C (instead of Γ P vak C ) to denote the set of extremals of the vakonomic variational principle P vak C . With the same submanifold C ⊂ TQ one can associate a different construction of a nonholonomic variational principle P nh C = (L, T nh C , W nh C ). In this situation we restrict ourselves again to admissible trajectories in C T nh C := T C , but impose other restrictions on the set of admissible variations. Namely, according to Chetaev's principle we consider only those variations δ ξ γ ∈ W st TQ that are generated by infinitesimal variations ξ(t) such that Notice that W nh C ⊂ TC ∩ VTQ, where VTQ stands for the vertical distribution on TQ defined as the kernel of Tτ Q : TTQ → TQ. We shall refer to the elements of W nh C as to nonholonomic admissible variations. By the very definition of Chetaev's principle it is clear that Proposition 3.6. The set of extremals of the nonholonomic variational principle P nh C is precisely the constrained nonholonomic dynamics associated with C.
It is known that the extremals of P nh C do not correspond to minimization (maximization) of S L . In fact, they are not "the shortest" but "the straightest" paths as noticed by Hertz (see [19] and the references therein). Throughout the paper we will use the abbreviated symbol Γ nh C (instead of Γ P nh C ) to denote the set extremals of the nonholonomic variational principle P nh C .
In the special case when the constraints are linear (affine), meaning that C = D (C = X + D), where D is a linear distribution (and X is a vector field), Chetaev's principle becomes the well-known d'Alembert's principle: we consider admissible variations δ ξ γ ∈ W st TQ that are generated by infinitesimal variations ξ(t) with values in D: In this case we have an explicit information about the infinitesimal variations, i.e., ξ(t) ∈ D q(t) . Note, however, that the variations δ ξ γ will, in general, not be tangent to C = D. For this reason the knowledge of L| D is not sufficient to study nonholonomically constrained dynamics. For a deeper discussion of the constrained dynamics in a more general setting of algebroids consult [10,12].

Equations of motion.
Calculating the equations constituting the extremals of the variational principles P st TQ , P vak C and P nh C considered above is not necessary from the point of view of our approach to the comparison problem, and hence reading this paragraph can be omitted without any loss in understanding of our ideas. The reason why we perform these calculations is to ensure the reader that the formalism of variational principles leads to the same conclusions as the standard approach.
Euler-Lagrange equations, E-L equations shortly, describing the extremals of P st TQ can be derived as follows. Take γ ∈ T st TQ and δ ξ γ ∈ (W st TQ ) γ , recall the canonical isomorphism ε Q : T * TQ −→ TT * Q dual to κ Q introduced at the end of Section 2, and denote Λ L := ε Q • dL : TQ → TT * Q. Let us calculate and the subtraction in the first term of the last line is made with respect to the lifted vector-bundle structure Tπ Q : TT * Q → TQ (see the end of Section 2). Since both vectors Λ L (γ(t)) and t t λ L (γ(t)) project to the same covector λ L (γ(t)) under τ T * Q , the vector Λ L (γ(t)) − t t λ L (γ(t)) belongs to T 0 T * Q and thus, by (2.8), we have Thus for ξ(t) vanishing at the end-points we get Thus locally E-L equations describing the extremals of the variational principle P st TQ read as This formula agrees with the algebroid E-L equations introduced in [9]. Observe that if v a =q a are the standard coordinates on TQ, we have C a bc (q) = 0 and A a b (q) = δ a b , and thus (3.4) becomes the standard form of the E-L equations. Now we will show how to modify the above calculations to obtain the equations of motion for the constrained dynamics P vak C and P nh C associated with a submanifold C ⊂ TQ. For the vakonomic problem observe that since δ ξ γ ∈ TC, we can add to L any function φ(t, q, v) vanishing at (q, v) ∈ C without changing the value of the variation dS L (γ), δ ξ γ . Thus if γ ∈ T C is an extremal of the variational principle P st TQ with Lagrangian L + φ, then γ is an extremal of the principle P vak C with Lagrangian L. This reasoning gives sufficient (and also necessary -see e.g., Thm. 4.1 in [4] or Lemma 3 in [13]) conditions for extremals of P vak C . Locally these equations read as where C is locally described by equations Φ α (q, v) = 0, for α = 1, . . . , k, and µ α (t) are arbitrary functions (known usually as multipliers).
, the necessary and sufficient condition for γ(t) to be an extremal reads as Locally it takes the form where Φ α (q, v) are as above and ν α (t) are arbitrary functions. Again passing to the standard coordinate system v b =q b with C a bc (q) = 0 and A a b (q) = δ a b puts (3.6) into the well-known standard form.
A comparison of variational principles. Looking at admissible variations rather than equations of motion will allow us to compare extremals of different variational principles in a simple manner. Recall that Γ P denotes the set of extremals of a given variational principle P.
Proposition 3.7. Assume that P = (L, T , W) is a restricted variational principle of P = (L, T , W), i.e., T ⊆ T and W ⊆ W (i.e., for any γ ∈ T we have W γ ⊆ W γ ). Then Remark 3.8. The above proposition looks trivial, however, it allows an immediate derivation of some classical results such as: • Proposition 6.2 in [5], which states that every trajectory of an unconstrained system that respects the constraints is a trajectory of a nonholonomically and vakonomically constrained system simultaneously. This is obvious in light of Proposition 3.7 as vakonomic and nonholonomic variational principles are restrictions of the standard variational principle.
• remark after Theorem 2 in [6], which states that vakonomic trajectories with trivial multipliers µ a (t) = 0 (cf. equation (3.5)) are also nonholonomic trajectories. This again is obvious as any vakonomic extremal with trivial multipliers is, in fact, an extremal of the unconstrained variational problem and we can repeat the above reasoning.
• Theorem 3.2 (i) in [7], which states that for a sub-Riemannian geodesic problem with a totally geodesic constraints distribution (i.e., such that every unconstrained geodesic tangent to the constraints at a point remains tangent for its entire length) the nonholonomic geodesics are precisely the unconstrained geodesic respecting the constraints. This fact is again obvious: from Proposition 3.7 we know that the unconstrained geodesics respecting the constraints are also the nonholonomic ones. Moreover, by the assumptions and by the uniqueness of (nonholonomic) geodesics with a given initial velocity, we get the equality of these two sets. Theorem 3.2 (ii), stating that in this case every nonholonomic geodesic is also a vakonomic one is again clear in the light of Proposition 3.7. From this simple reasoning we see that the additional assumption present in Theorem 3.2 (that the distribution perpendicular to the constraints is integrable) is superfluous.
More generally, we can compare two variational principles P = (L, T , W) and P = (L, T , W) defined on the same manifold TQ and with the same Lagrangian L even if W, T and W, T are not so directly related, provided that we have some information about infinitesimal symmetries of L.
The condition δγ − δγ ∈ (ker dL) γ from the above proposition may be understood as a symmetry condition. Indeed, it means that the set of admissible variations W 0 γ is contained in W 0 γ up to infinitesimal symmetries of L.
As a simple concrete illustration of our approach to the question of comparing variational principles we can easily prove the following well-known fact (compare e.g., Prop. 2.8 in [20]). Proposition 3.10. . Let D ⊂ TQ be a distribution on a manifold Q. Then vakonomic and nonholonomic variational principles associated to D (for the same Lagrangian) coincide, that is, The reasoning uses the fact that D is integrable if and only if κ Q maps TD into TD (see Corollary 2.2). For a given admissible path γ(t) take a generator of a nonholonomic variation ξ(t) ∈ D. Now t t ξ(t) ∈ TD and hence δ ξ γ(t) = κ Q (t t ξ(t)) ∈ TD, thus δ ξ γ is a vakonomic variation. Conversely, given a vakonomic variation δ ξ γ(t) = κ Q (t t ξ(t)) ∈ TD, we have t t ξ(t) = κ Q (δ ξ γ(t)) ∈ TD due to the fact that κ Q is an involution. Hence ξ(t) ∈ D is a generator of a nonholonomic variation.
If D is not integrable then κ Q (V ) / ∈ TD for some V ∈ T γ 0 D. Now choose a point t ∈ (t 0 , t 1 ) and consider an admissible path γ ∈ T D and a generator ξ(t) ∈ D of a nonholonomic variation δ ξ γ such that γ( t) = γ 0 and t t= t ξ(t) = V . Clearly δ ξ γ( t) = κ Q (t t= t ξ(t)) = κ Q (V ) / ∈ TD, and hence the nonholonomic variation δ ξ γ cannot belong to W vak D .
Constraints discussed in the above proposition are known as holonomic constraints. Notice that integrability of D is a necessary and sufficient condition for the principles P vak D and P nh D to coincide and thus it implies that the sets of extremals Γ vak D and Γ nh D coincide as well, but it is not necessary for the latter. For example, if L is constant, then Γ vak D = Γ nh D = T D (the set of all admissible paths), independently of D.

Non-invariant Chaplygin systems
In this section we shall apply Proposition 3.9 to solve the comparison problems (Q2) and (Q3) for a particular class of systems with linear constraints, namely for (non-invariant) Chaplygin systems. In particular we will be able to recover (and generalize) many previous results from [6,7,8]. To demonstrate the power of our approach we shall omit the usual assumptions of the G-invariance of both: the constraints distribution and the Lagrangian.
Chaplygin systems. Consider a right principal G-bundle π : Q → M = Q/G. By a vertical distribution on Q we shall understand the distribution VQ := ker π * ⊂ TQ consisting of all vectors tangent to the fibres of π. By R g (p) or simply p · g we shall denote the action of an element g ∈ G on a point p ∈ Q. Note that the induced action (R g ) * preserves VQ, i.e., (R g ) * V q Q = V q·g Q. Usually in the literature the G-invariance of the Lagrangian L and the horizontal distribution HQ is assumed. Such systems are called Chaplygin systems [6,8], which term was coined by Koiller [17]. Sometimes Chaplygin systems are described as abelian or non-abelian, depending on the commutativity of the structural Lie group G. Cantrijn et al. [3] use the adjective generalized Chaplygin system in the same sense as other authors [6,8,17] use the word non-abelian (to emphasize that the Lie group G is general). Our Definition 4.2 describes a more general situation with no invariance conditions assumed. To distinguish it form the standard setting we added the adjective non-invariant. Clearly the standard Chaplygin system are a special case of the non-invariant Chaplygin system with additional symmetry assumptions. Thus all our considerations about non-invariant systems system will hold also in the standard case.
At this point it is worthy to remark about the side convention. We speak about systems with the right action of the structural group following the classical textbook [16]. However, all our results remain valid also for systems with the left group action, provided that we carefully substitute the right action with the left action, change ad g −1 to ad g , etc.
With a given (non-invariant) Chaplygin system one can naturally associate nonholonomically and vakonomically constrained dynamics, taking HQ to be the constraints distribution C. Note that the admissible paths T HQ in the corresponding variational principles are precisely the tangent lifts of the horizontal curves in Q (we shall therefore refer to the elements of T HQ as to horizontal admissible paths). In order to solve the comparison problems for these two dynamics we shall investigate deeper the geometry of the (non-invariant) Chaplygin system.

Fundamental vector fields.
Denote by g the tangent space of the Lie group G at the identity e equipped with the canonical left Lie algebra structure [·, ·] g : g × g → g. Note that for any q ∈ Q, since the pointed fibre (Q π(q) , q) can be canonically identified with (G, e) via the G-action, we can canonically identify V q Q with g = T e G, and therefore there exist a vector bundle isomorphism φ Q : VQ → Q × g. Now for each a ∈ g we can construct a fundamental vector field a defined by a = (φ Q ) −1 (Q × {a}). It is well-known ( [16]) that the flow of a at t is R exp(t·a) , that (R g ) * a = ad g −1 a and that the association a → a is a Lie algebra homomorphism, i.e., [ a, b] = [a, b] g for each a, b ∈ g.
The canonical splitting and connection 1-forms. The (non-invariant) Chaplygin system on π : Q → M provides us with a canonical splitting TQ = HQ ⊕ Q VQ. Combining this with the canonical isomorphism φ Q : VQ ≈ Q × g one gets TQ = HQ ⊕ Q (Q × g) .
Using the above identification we can project every vector in TQ to its g-part. We shall denote this projection by Usually ω is called a 1-form of the connection HQ.
Note also that the canonical splitting TQ = HQ ⊕ Q VQ induces the splitting TTQ = THQ ⊕ TQ TVQ. Again we can combine the latter with the tangent map of the canonical isomorphism Tφ Q : TVQ ≈ TQ × Tg ≈ TQ × g × g and get TTQ = THQ ⊕ TQ (TQ × g × g) .
It follows that every vector in TTQ can be projected to its Tg = g × g-part: Clearly, this map is simply the 1-form of the lifted connection THQ.
Horizontal lifts. At each point q ∈ Q the tangent map π * is an isomorphism between H q Q and T π(q) M . Therefore, given a vector X ∈ T x M and a point q ∈ Q such that π(q) = x, we can lift X to a unique horizontal vector X q ∈ H q Q such that π * X q = X. In other words, we have a canonical vector bundle isomorphism h : Q × M TM → HQ such that X q = h(q, X). Applying the lifting procedure point-wise to a base vector field X ∈ X (M ) we obtain its horizontal lift X ∈ X (Q).
The construction of the horizontal lift allows us to introduce several interesting geometric structures associated with the structure of a (non-invariant) Chaplygin system on π such as the curvature of HQ, the map B (which measures the rate of G-invariance of HQ) and two particular derivatives of the Lagrangian (we will call them a horizontal and a vertical derivative). We shall describe these in the remaining part of this paragraph.
It is well known that for any two base vector fields X, Y ∈ X (M ) the vector [ X, Y ] − [X, Y ] at q ∈ Q belongs to V q Q (i.e., is vertical). Moreover the association (X, Y ) → [ X, Y ] − [X, Y ] is C ∞ (M )-linear with respect to both X and Y (i.e., has tensorial character). Therefore it defines a bilinear and skewsymmetric map R : Q × M ∧ 2 TM −→ VQ.
Combining R with the g-projection ω : VQ ⊂ TQ → g we obtain a bilinear skew-symmetric g-valued map Similarly, observe that for any base vector field X ∈ X (M ) and for any a ∈ g, the Lie bracket [ X, a] at q ∈ Q is vertical. Moreover the association (X, a) → [ X, a] is C ∞ (M )-linear (tensorial) with respect to X and R-linear with respect to a. Therefore it defines a bilinear map Combining B with the g-projection ω : VQ ⊂ TQ → g gives us a g-valued bilinear map The map B measures the non-invariance of the horizontal distribution with respect to the G-action as the following remark explains.

Remark 4.3.
For a given a ∈ g consider a curve g a (t) := exp(t · a) ∈ G (i.e., q → q · g a (t) is the flow of the fundamental vector field a). Now for a given X ∈ T π(q) M consider a curve Clearly the first jest of this curve at t = 0 is a vector in T 0q TQ. Due to the canonical identification T 0 TQ ≈ TQ × Q TQ, we may represent this vector as a pair of vectors in T q Q (in fact it turns out that both vectors are elements of V q Q ⊂ T q Q). . The first of these vector is represented by curve q · g(t), thus it is the fundamental field a q . By the definition of the Lie derivative, the second is We conclude that if HQ is G-invariant then (R g ) * X q = X q·g , and thus B(q)(X, a) = 0.
Indeed, from the G-invariance of HQ we conclude that Recall the lifting isomorphism h : Q × M TM → HQ. For a given X ∈ T x M consider a map h(·, X) : Q x → HQ. Now for a given b ∈ g, by h b X q ∈ T Xq HQ we shall denote the tangent map of h(·, X) evaluated on the fundamental vector field b q . In other words, h b X q is a first jet at t = 0 of a curve We define the horizontal derivative of the Lagrangian BL : HQ → g * by the formula Again given b ∈ g we can define an object of a similar nature, namely the vertical derivative of the Lagrangian FL : HQ → g * by the formula In other words, FL( X q ), b is just the standard fiber-wise derivative of L in the direction of b. In the case of the standard (G-invariant) Chaplygin system (with a hyper-regular Lagrangian) the map FL( X) coincides with the notion of the momentum map (restricted to HQ ⊂ TQ) along a trajectory of the system (cf. Sec. 3 of [7]). The derivatives BL and FL allow one to express easily the condition of the symmetry of the Lagrangian. Indeed, since L is G-invariant, then for any g(t) ∈ G 0 = d dt t=0 L R g(t) * X q = dL( X), t t=0 R g(t) * X g .
Take now g(t) = exp(t · b) for any b ∈ g. We can decompose the first jet of R g(t) * X q into the sum (with respect to Tτ : TTQ → TQ) of the first jets of R g(t) * X q − X q·g(t) and X q·g(t) By Remark 4.3, the first of these curves corresponds to a vector ( b q , B(q)(X, b)) ∈ T q Q ⊕ T q Q ≈ T 0q TQ. The second is simply h b X b . Now the sum of these two vectors is equal to the sum taken with respect to the vector bundle structure in τ TQ : TTQ → TQ. It follows that Note that above we had to apply the addition in τ TQ not Tτ Q , since dL in not linear with respect to the latter vector bundle structure. Note also that we actually used only the invariance of L on horizontal vectors.

Local description.
In order to derive some results in the next paragraph we need to introduce local coordinates adopted to the structure of the (non-invariant) Chaplygin system on π.
Consider any local trivialization Q ≈ M × G of the G-bundle π and local coordinates (q a ) = (x i , g α ) adopted to this trivialization (i.e., (x i ) with i = 1, . . . , m are coordinates on M and (g α ) with α = 1, . . . k coordinates on G). Choose a local basis {e i } on TM and a basis {e α } of T e G. The set of vector fields { e i , e α } with i = 1, . . . , m and α = 1, . . . , k, consisting of the horizontal lifts of fields e i and the fundamental vector fields associated with elements e α ∈ g, is a local basis of vector fields on Q. We introduce a coordinate system (q a , v b ) = (x i , g α , y j , a β ) on TQ associated with this particular basis (recall our considerations form the first paragraph of Section 2). By its very definition these coordinates are naturally adopted to the splitting TQ = HQ ⊕ Q VQ, i.e., for a vector Z ∈ TQ represented by (x i , g α , y j , a β ) its HQ-projection is simply (x i , g α , y j , 0) and its VQ-projection is (x i , g α , 0, a β ). Moreover the g-projection of Z is ω(Z) = a α e α ∈ g, i.e., the considered coordinate system is also naturally compatible with the identification φ Q : VQ ≈ Q × g.
In the considered situation rule (2.1) which relates the induced coordinatesq a = (ẋ i ,ġ α ) with v a = (y i , a α ) takes a special formẋ with vanishing entries A i α (q) of the transition matrix, and entries A α β (q) depending only on G. Let R α ij (q) and B α iβ (q) be the coefficients of the maps R and B in the chosen g-basis, i.e., where X = X i e i , Y = Y j e j and a = a β e β . Clearly the above coefficients are also coefficients of R and B in basis e α , i.e., R(q)(X, Y ) = e α R α ij (q)X i Y j and B(q)(X, a) = e α B α iβ (q)X i a β . From our previous considerations and from the definition of R and B it follows that. Using the local coordinates (x i , g α , y j , a β ) we can easily describe the derivatives BL and FL introduced before. Namely, for a horizontal vector X q ∼ (x i , g α , y j , 0) and b = b α e α ∈ g, the curve X q corresponds to (x i , g α (t), y j , 0), where g α (t) is a local form of the flow of b. Thus h b X q is represented by (ẋ i = 0,ġ α = A α β (g)b β ,ẏ j = 0,ȧ β = 0). Similarly, a curve X q + t · b q corresponds (x i , g α , y j , t · b β ) and thus vector V Xq b q is represented by (ẋ i = 0,ġ α = 0,ẏ j = 0,ȧ β = b β ). We conclude that The geometry of admissible variations. In this part we shall study the standard admissible variations (i.e., elements of W st TQ ) along a given horizontal admissible path in T HQ . Our crucial tool in this study will be the splitting TQ = HQ ⊕ Q VQ introduced above.
Consider a horizontal admissible path X(t) = t t q(t) ∈ H q(t) Q, being the tangent lift of a horizontal curve q(t). Denote by x(t) = π(q(t)) the base projection of q(t) and by X(t) = π * X(t) ∈ T x(t) M the base projection of X(t). Take a generator ξ(t) ∈ T q(t) Q of the standard admissible variation δ ξ(t) X(t). According to Proposition 3.1 (ii) this variation is an element of T X(t) TQ. Taking into account the induced splitting TTQ = THQ ⊕ TQ TVQ and the fact that X(t) is horizontal, we have δ ξ(t) X(t) ∈ T X(t) HQ ⊕ T θ q(t) VQ, where θ q(t) stands for a null vector in V q(t) Q ⊂ T q(t) Q. Our goal now will be to describe the TVQ-part of this variation.
Observe that using the splitting TQ = HQ ⊕ Q VQ, we can decompose the generator ξ(t) itself into its horizontal and vertical parts ξ(t) = Y (t) + b(t), where Y (t) is a horizontal lift of some base curve Y (t) ∈ T x(t) M to q(t) and b(t) is a fundamental vector field associated with b(t) ∈ g taken at a point q(t). Clearly, due to (3.2), we have δ ξ(t) X(t) = δ Y (t) X(t) + δ b(t) X(t). In the result below we describe the TVQ-parts of these two components.
in the canonical identification TVQ ≈ TQ × g × g.
In this setting the assertion can be proved by a direct coordinate calculation. Applying formula (3.1), describing the coordinate form of the admissible variation (taking into account the coefficients of the Lie bracket C a bc (q) and transition matrices A a b (q) described in Proposition 4.5 and in equation (4.5)) one easily checks that δ b(t) X(t) corresponds toẏ j = 0, as well as toẋ The last three of these equations mean that the TQ × g × g-part of δ b(t) X(t) is precisely (4.6). The fact that the TQ-component is b(t) follows also directly form Proposition 3.1 (iii). This proves part (i).
A similar calculation as before shows that for this variation a γ = 0 andȧ β = R β ij (q(t))y i (t)z j (t), which proves part (ii). From the linearity of the variation (3.2), we conclude that (4.8) is satisfied if and only the g×g-component in the TVQ ≈ TQ × g × g-part of the variation δ ξ(t) X(t) vanishes. But this, in turn, means that the TVQpart of this variation is trivial, and hence that the variation belongs to THQ. This proves part (iii).

Remark 4.7.
It follows form the above proof and from local forms of vectors h b X and V X b (considered at the end of the previous section) that we can decompose the variation δ b X into the following sum (with respect to the vector bundle structure τ TQ : TTQ → TQ) Hence, in the light of (4.3) and (4.4), the differentiation of L at X in the direction of δ b X reads dL( X(t)), δ b(t) X(t) = BL( X(t)), b(t) + FL( X(t)),ḃ(t) + B(q(t))(X(t), b(t)) = BL( X(t)) − d dt FL( X(t)), b(t) + FL( X(t)), B(q(t))(X(t), b(t)) − d dt FL( X(t)), b(t) . (4.9) The comparison problem. Now we are ready to formulate our main result. Its part (b) completely solves the comparison problem (Q2) for the (non-invariant) Chaplygin systems. Part (c) solves completely a variant of an inverse problem (Q3) when a vakonomic extremal corresponds to a particular choice of a Lagrange multiplier, whereas part characterizes (a) these nonholonomic trajectories which are simultaneously extremals of an unconstrained dynamics.

Theorem 4.8. For the (non-invariant) Chaplygin system introduced above: (a) A nonholonomic extremal X(t) ∈ H q(t) Q is an unconstrained one if and only if
for every t ∈ [t 0 , t 1 ] and every vector b ∈ g.
Proof. We shall first prove point (a). Our idea is very simple. Let us take a nonholonomic extremal X(t) ∈ H q(t) Q and a generator ξ(t) = Y (t) + b(t) (vanishing at the end-points) and considered the associated standard admissible variation δ ξ(t) X(t). In accordance with the spirit of Proposition 3.9, we would like to compare this variation with some nonholonomic admissible variation (with vanishing end-points). The splitting of the generator ξ(t) = Y (t) + b(t) provides a natural candidate for such a variation, namely δ Y (t) X(t). From the linearity of the variation with respect to the generator (3.2), we have or in the integrated version Since X(t) is a nonholonomic extremal, then dS L ( X), δ Y X = 0 and thus (4.14) dS L ( X), δ ξ X = dS L ( X), δ b X .
Integrating (4.9) we get By (4.14), vanishing of dS L ( X), δ b X for every b(t) ∈ g vanishing at the end-points is a necessary and sufficient condition for X(t) to be an unconstrained extremal. In the light of the above equation it is equivalent to the vanishing of the integrand for every such b(t). This proves point (a).
To prove (b) we shall proceed analogously with a modification that ξ(t) should now be a generator of a vakonomic admissible variation (still vanishing at the end-points). By Lemma 4.6 such generators are characterized by equation (4.8). Therefore we can modify (4.9) to the following form dL( X(t)), δ b(t) X(t) = BL( X(t)), b(t) − FL( X(t)), R(q(t))(X(t), Y (t)) .
The proof of point (c) is conceptually not much different from the proofs of the two previous parts. Let us start with explaining why the Lagrangian modified by a multiplier takes a form L(Z, t) = L(Z) − λ(t), ω(Z) for some λ(t) ∈ g * . It becomes clear, in the light of equation (3.5), if one observes that the horizontal distribution HQ ⊂ TQ is characterized by equation ω(Z) = 0 where Z ∈ TQ.
Take now any nonholonomic admissible variation δ Y (t) X(t) with vanishing end-points. Since this variation is, in particular, also a standard admissible variation with vanishing end-points, and from the fact that X(t) is a solution of an unconstrained problem with the modified Lagrangian we know that and thus, after integration and hence dS L ( X), δ Y X = 0 (i.e., X(t) is a nonholonomic extremal) if and only if the above integral vanishes for every Y (t). By the standard argument this implies (4.12).
Other aspects of the comparison problem. Let us now explore some natural questions related with our results form the previous paragraph. First of all, as simple consequence of our consideration from the proof of Theorem 4.8, we get the following characterization of vakonomic extremals corresponding to a prescribed multiplier λ(t) ∈ g * . Note that (4.16) can be understood as a linear equation defining the vakonomic multiplier. Observe also that (4.16) for λ(t) = 0 gives condition (4.10) from Theorem 4.8. This by no means just a coincidence: an unconstrained extremal is a vakonomic extremal with trivial vakonomic multiplier.

Proposition 4.9. A horizontal curve X(t) ∈ H q(t) Q is a vakonomic extremal associated with a modified Lagrangian L(Z, t) = L(Z) − λ(t), ω(Z) if and only if it satisfies the following two conditions:
for every Y (t) ∈ T π(q(t)) M vanishing at the end-points, and Proof. The horizontal curve X(t) is a vakonomic extremal associated with the multiplier λ(t) if and only if dS L , δ ξ X vanishes on every admissible variation δ ξ(t) X(t) with vanishing end-points. Splitting the generator into its horizontal and vertical parts ξ(t) = Y (t) + b(t), and using the linearity (3.2) it amounts to check conditions dS L , δ Y X = 0 and dS L , δ b X = 0 separately.
In light of the proof of Theorem 4.8 (c), condition dS L , δ Y X = 0 is equivalent to (4.15). Now, Integrating the above equality by parts (and using the fact that b(t) vanishes at the end-points) we get By the standard reasoning, the vanishing of this integral for every b(t) ∈ g vanishing at the end points is equivalent to (4.16).
Another interesting issue is the relation between parts (b) and (c) in Theorem 4.8. Both parts give the necessary and sufficient conditions for a trajectory to be simultaneously a vakonomic and nonholonomic extremal. Therefore we should expect that conditions (4.11) and (4.12) are equivalent. This is indeed the case, when the form of the vakonomic multiplier (4.16) is taken into account. Therefore condition (4.11) can be viewed as a version of (4.12) when we have no explicit knowledge of the vakonomic multiplier. Note, however that the equivalence of these two conditions is a non-trivial statement. Lemma 4.10. Conditions (4.11) and (4.12) are equivalent. More precisely, (4.12) for some multiplier λ(t) satisfying (4.16) implies (4.11). Conversely, if (4.11) holds then there exists a multiplier λ(t) satisfying (4.16) such that (4.12) holds.
Proof. Choose and admissible path X(t) ∈ H q(t) Q. Consider any generator of a vakonomic variation (i.e., a pair Y (t) ∈ T π(q(t)) M and b(t) ∈ g satisfying (4.8)) and a multiplier λ(t) ∈ g * satisfying (4.16). We shall show that Indeed, the above formula can be justified by the following calculation (for the simplicity of notation we do not write the time dependence explicitly): Assume now that (4.12) holds, i.e., the left-hand side of (4.17) vanishes. Restrict our attention to those b(t)'s that are the solutions of (4.8) and additionally vanish at the end-points. Integrating (4.17) for such b(t)'s we get (4.11). The passage from (4.11) to (4.12) requires a more attention. Consider a class of solutions b(t) = b Y (t) of (4.8) (for all possible Y (t)'s) with the initial condition b(t 0 ) = 0. Of course we have no guarantee that b(t 1 ) = 0. The crucial observation is that, if (4.11) holds, then the value of the linear functional  We conclude that Hence, since any linear function on b(t 1 ) ∈ g is determined by an element of g * , there exists α ∈ g * such that for any b(t) from the considered class. Taking this into account and integrating (4.17) we get: Now it is enough to choose λ(t) satisfying (4.16) such that λ(t 1 ) = FL( X(t 1 )) − α to guarantee that I λ, R(q)(X, Y ) dt = 0 for any Y (t). This implies (4.12).
The symmetric case, relation with the classical results. Let us now see how our results look in the special cases of a (non-invariant) Chaplygin system subject to some symmetry conditions. We shall distinguish three particular situations: when HQ is G-invariant, when L is G-invariant, and the (standard) Chaplygin case (i.e., both the constraints and the Lagrangian are G-invariant). In the first case Corollary 4.11 (invariant constraints). Assume that the constraints HQ ⊂ TQ are G-invariant (i.e., (R g ) * X q = X q·g for any g ∈ G and X ∈ T π(q) M ). Then, by Remark 4.3, B ≡ 0 and thus • Equation (4.8) defining the vakonomic variation readsḃ(t) + R(q(t))(X(t), Y (t)) = 0.
• the vakonomic multiplier takes the form λ(t) = FL( X(t)) + const. Now we can relate our results to the classical results from [6,7,8].
1. In the (standard) Chaplygin case the explicit construction of the vakonomic multiplier (being the momentum map FL( X(t)) shifted by a constant) appeared in [7] in Thm. 3.1 (ii) and Prop. 4, in [8] Prop. 3 (1) and Cor. 4, as well as along the lines of Sec. 6 in [6]. Our formula (4.16) is much more general at it allows to find the vakonomic multiplier also for systems without symmetry. A similar equation, in the coordinate form, can be found in Prop. 4 in [6] , yet the relation of the multiplier and the momentum FL( X(t)) is not so significant and the requirement that the multiplier is defined by section over HQ is needed. Clearly, in the general Chaplygin case this last requirement may be too strong.

Theorem 4.8 (c) can be found in [8] Prop. 2 (for systems of mechanical type with regular Lagrangians)
and in Theorem 2 in [6] (if the multiplier is defined by a section over HQ). Actually it is a consequence of the general formula of Rumianstev [22], which requires the explicit knowledge of the vakonomic multiplier.
As we can derive the explicit value of the vakonomic multiplier in the (standard) Chaplygin case (given in general by (4.16)), one can formulate Theorem 4.8 (c) in this specific setting. This is exactly Thm. 3.1 in [7] (where the regularity of the Lagrangian is assumed), Prop 3. (2) in [8] (for abelian Chaplygin systems with regular mechanical Lagrangians), as well Cor. 1 and Prop. 6 in [6] (the Lagrangian has to be sufficiently regular). Note that our result works in a more general geometric situation (no symmetry) and without any regularity assumptions.
Actually Favretti in [7] formulates Theorem 4.8 (c) for invariant affine constraints. In this paper we concentrated solely on the linear case, yet extending Theorem 4.8 to the affine setting does not requires much effort.
3. The criterion from Theorem 4.8 (b) was so far completely absent in the literature. According to Lemma 4.10 it can be understood as a version of Theorem 4.8 (c) in the case that the explicit value of the vakonomic multiplier is unknown (or impossible to derive). Again no regularity condition for the Lagrangian, nor symmetry requirements are needed in this case.
4. Observations similar to Theorem 4.8 (a) has been considered in the literature [7,8] as the special cases of the more general result for vakonomic systems assuming the vanishing of the multiplier. Concrete examples were already discussed in Remark 3.8.

Fernandez and Bloch made, in [8] Prop. 3 (3)
, a remark that the property of being conditionally variational (i.e. answering (Q1) positively) is not affected by adding a base dependent potential to the Lagrangian. In fact, it is obvious from our formula (4.11) that any change of the Lagrangian which does not change BL( X) and FL( X), R(q)(X, Y ) preserves this property.

Left invariant systems on Lie groups
In this section we shall solve the consistency problems (Q2) and (Q3) for a class of systems on Lie groups with left-invariant constraints. Such situations were considered for instance by Koiller [17] under the name generalized rigid body with constrains (which term he attributes to Arnold [1]). In contrast to the standard treatment, here the invariance of the Lagrangian will not be assumed. These results are related to the (noninvariant) Chaplygin systems considered in previous Section 4 (see Remarks 5.7 and 5.8), yet in some cases extend these as explained at the end of Remark 5.7.
Geometric setting. Consider a Lie group H and denote by h its tangent space at the identity, equipped with the canonical left Lie algebra structure [·, ·] h : h × h → h. In the remaining part of this section we shall extensively use the canonical trivialization of TH: In the induced trivialization Proof. The justification of formula (5.1) is very simple: standard admissible curves in TH are simply the tangent lifts of base curves in H. Thus a curve (h(t), η(t)) corresponds to an admissible curve if and only if its image h(t) * η(t) ∈ TH is the tangent lift of the base projection h(t).
To prove (5.2) consider an admissible curve (h(t), η(t)) ∈ H × h and a generator (h(t), ξ(t)) ∈ H × h of the admissible variation δ ξ(t) η(t) ∈ TTH. By Proposition 3.1 this variation projects to the admissible curve under τ TH and to the generator under Tτ H . This explains the first two entries in the triple (5.2).
To justify the last entry choose a basis {e α } of TH consisting of left-invariant vector fields on H. Clearly, in the induced coordinates (h α , a β ) on TH adapted to this basis, the projection to the h-factor in TH ≈ H × h reads simply (h α , a β ) → a β e β ∈ h. Moreover, the Lie bracket of left-invariant vector fields [e β , e γ ] = e α c α βγ has constant coefficients c α βγ being the constants of the Lie algebra h. Now for η(t) = η α (t)e α and ξ(t) = ξ α (t)e α , formula (3.1) shows that, in the induced coordinates on TTH, theȧ α -entry in the coordinate formula for the admissible variation readsξ α (t)+c α βγ η β (t)ξ γ (t). This corresponds precisely to an elementξ(t) + [η(t), ξ(t)] h in the induced projection of TH ≈ TH × h × h to its second h-factor.
Observe that for a given η(t) ∈ h and initial point h(0) = h 0 ∈ H equation (5.1) determines a unique solution h(t). Therefore, with some abuse of notation, we shall sometimes refer to η(t) itself as to an admissible trajectory (keeping a fixed initial point h 0 in mind). Similarly, we shall denote by δ ξ(t) η(t) the admissible variation of the form (5.2).
In the remaining part of this section we shall compare the nonholonomically and vakonomically constrained dynamics associated with a Lagrangian function L : TH ≈ H × h → R and a left-invariant distribution D ⊂ TH. Clearly, such a distribution, in the canonical trivialization TH = H × h, corresponds to H × d, where d ⊂ h is a linear subspace. We shall denote the restricted variational principles associated with D by P nh d = (L, T d , W nh d ) and P vak d = (L, T d , W vak d ). Clearly, an admissible trajectory η(t) ∈ h belongs to T d if and only if η(t) ∈ d for every t ∈ [t 0 , t 1 ].
Consider now any splitting h = d ⊕ d ′ into the direct sum of linear subspaces. Denote by P : h → d and P ′ : h → d ′ the canonical projections of h into the factors of this splitting. Note that every generator ξ(t) ∈ h of an admissible variation δ ξ(t) η(t) can be decomposed as ξ(t) = a(t)+b(t), where a(t) = P (ξ(t)) ∈ d and b(t) = P ′ (ξ(t)) ∈ d ′ . Decomposing the g-partξ(t) + [η(t), ξ(t)] h of an admissible variation δ ξ(t) η(t) into d and d ′ -components allows to characterize vakonomic admissible variations among all admissible variations. Proposition 5.2. In the above setting, an admissible variation δ ξ(t) η(t) along an admissible trajectory η(t) ∈ d generated by ξ(t) = a(t) + b(t) belongs to W vak d if and only if Every nonholonomic admissible variation along η(t) ∈ d is generated by ξ(t) = a(t) ∈ d and thus the decomposition of its h-part into d and d ′ -components reads aṡ The comparison problem. The results of this paragraph describe the solutions of the comparison problems (Q2) and (Q3) for systems introduced in the previous paragraph. We also answer the question when a nonholonomic extremal is an unconstrained one. In general we follow the line sketched in the previous Section 4, but now the splitting h = d ⊕ d ′ plays the role of the splitting TQ = HQ ⊕ Q VQ.
Before formulating our main result note that, thanks to the canonical decomposition TH ≈ H × h, we can treat the Lagrangian L : TH → R as defined on the product G × g. Therefore we can differentiate L(h, η) with respect to h and η separately. Now these differentials allow to express nicely the differential of L in the direction of an admissible variation δ ξ η:

(5.4)
Here for any φ ∈ h * and any η ∈ h, ad * η φ, · = φ, [η, ·] h . The relation of the differentials h * ∂L ∂h and ∂L ∂η with the differentials BL and FL introduced in the previous Section 4 will be explained in Remark 5.7. Now we are ready to state the main result of this section.

Theorem 5.3. For the systems described above:
(a) A nonholonomic extremal (h(t), η(t)) ∈ H × d is an unconstrained extremal if and only if the covector

(b) A nonholonomic extremal (h(t), η(t)) ∈ H × d is a vakonomic extremal if and only if
for every pair a(t) ∈ d and b(t) ∈ d ′ vanishing at the end-points and related by equation (5.3).
Now dS L , δ a η vanishes since η(t) is a nonholonomic extremal and δ a η is a nonholonomic admissible variation vanishing at the end-points. Clearly η(t) is an unconstrained variation if and only if dS L , δ b η vanishes for every b(t) ∈ d ′ vanishing at the end-points. In light of (5.4) we see that dS L , δ b η vanishes if and only if By the standard reasoning we get condition (5.5).
To prove (b) take (h(t), η(t)) as above and consider a vakonomic admissible variation δ ξ(t) η(t) generated by ξ(t) = a(t) + b(t) vanishing at the end-points. From Proposition 5.2 we conclude that curves a(t) and b(t) are related by equation 5.3. Clearly, again dS L , δ a η = 0 and thus η(t) is a vakonomic extremal if and only if dS L , δ b η = 0 for every such b(t). Now we can write Integrating the above equation over I = [t 0 , t 1 ] we get condition (5.6).
To prove (c) observe first that the constrained distribution is characterized in h by equation P ′ (η) = 0. Thus the general form of the modified vakonomic Lagrangian is where λ(t) ∈ h * . Clearly, since the additional factor in the Lagrangian vanishes for every η ∈ d, we can take λ(t) = (P ′ ) * λ(t), thus restricting our attention to λ(t) ∈ Ann(d) ≈ (d ′ ) * . Now take a vakonomic extremal (h(t), η(t)) ∈ H × d associated with such a λ(t) and consider any nonholonomic admissible variation δ a(t) η(t) with vanishing end-points. Now the second part of Proposition Integrating the above equality over I, and taking into account that dS L , δ a η = 0 since η(t) is an unconstrained extremal of L, we get Clearly dS L , δ a η = 0 if and only if condition (5.7) holds.
Similarly to the Chaplygin case, in the setting considered in this section, we can deduce the equation defining the vakonomic multiplier. Moreover, due to a simple structure of TH, we are able to derive the nonholonomic equations of motion.
Lemma 5.4. An admissible curve (h(t), η(t)) ∈ H × d is: • a nonholonomic extremal if and only if the covector (5.5) for every t ∈ [t 0 , t 1 ] annihilates every a ∈ d ⊂ h: • a vakonomic extremal associated with a modified Lagrangian for every t ∈ [t 0 , t 1 ] and every a ∈ d , and Proof. The characterization of the nonholonomic extremals follows directly form formula (5.4) taken for Now, by the linearity of the variation with respect to the generator (3.2), (h(t), η(t)) is a vakonomic extremal associated with L if and only if dS L , δ a η = 0 and dS L , δ b η = 0 for every generators a(t) ∈ d and b(t) ∈ d ′ vanishing at the end-points. It follows from the proof of Theorem 5.3 (c) that the condition dS L , δ a η = 0 is equivalent to dS L , δ a η = I λ(t), P ′ [η(t), a(t)] h dt. Now using formula (5.4) and the restriction λ(t) = (P ′ ) * λ(t), we transform the above equality into The integrands are equal for every a(t) ∈ d vanishing at the end-points if and only if (5.9) holds.
To justify (5.10) let us calculate, In the above calculation we used the fact that λ(t) = (P ′ ) * λ(t). Integrating the above equality over I one gets that dS L , δ b η = 0 if and only if for every b(t) ∈ d ′ ⊂ g vanishing at the end-points. By the standard reasoning this is equivalent to condition (5.10).

Discussion of the main result.
Let us now discuss some aspects of our results from the previous paragraph. The first matter is the role of the choice of the completing factor d ′ ⊂ h. Obviously our characterizations are "if and only if", which suggests that they should not depend on this choice (which, let us remind, was arbitrary). This is indeed the case as we explain below in detail.

Remark 5.5. Consider two splittings
Observe that d ⊂ ker ∆P and that P ′ 2 = P ′ 1 +∆P . In other words the linear d-valued map ∆P describes the passage between both splittings. Our goal now is to show that all d ′ -dependent conditions from our previous considerations are preserved under the substitution of an element b ∈ d ′ 1 by an element b + ∆P (b) ∈ d ′ 2 , P 1 by P 2 , etc.
• Condition (a) from Theorem 5.3 reads as • Equation ( • The above point will be helpful in showing the splitting-independence of condition (b) from Theorem 5.3. Namely, the above considerations guarantee that if d ⊕ d ′ 1 -factors (a 1 , b 1 ) of ξ satisfy (a) with P ′ = P ′ 1 then its d ⊕ d ′ 1 -factors (a 2 = a 1 − ∆P (ξ), b 2 = b 1 + ∆P (ξ)) satisfy (a) with P ′ = P ′ 2 . Note also that both pairs simultaneously vanish at the end-points. Now We see that if η(t) is a nonholonomic trajectory and ξ vanishes at the end-points then I I 0 (t)dt = 0 and thus Remark 5.7 (System on Lie groups as (non-invariant) Chaplygin systems). Consider now a special case of a left-invariant system on a Lie group H such that the completing linear subspace d ′ ⊂ h is, in fact, a Lie subalgebra d ′ = g ⊂ h corresponding to a closed Lie subgroup G ⊂ H. Now we are in the setting of a generalized Chaplygin system on the principal G-bundle Q = H over the homogeneous space M = H/G of right quotients equipped with the canonical right G-action (see [16]). The splitting d ⊕ g = h defines the canonical splitting of TQ into its horizontal and vertical parts: HQ = H ×d ⊂ H ×h ≈ TH and VQ = H ×g ⊂ H ×h ≈ TH. The latter identification is precisely the canonical trivialization VQ ≈ Q × g considered in the previous Section 4. Clearly, since (R g ) * η = (L g ) * ad g −1 η, the right-G-action in the trivialization TH ≈ H × h reads as (R g ) * (h, η) = (hg, ad g −1 η). We clearly see that the horizontal distribution is G-invariant if and only if ad G d ⊂ d (and thus, in particular [g, d] ⊂ d).
Consequently, unless the latter is satisfied we deal with a truly non-invariant Chaplygin system.
It is not difficult to translate our considerations form Section 4 into the Language of the present section. The canonical identification of the punctured fibre (hG, h) with (G, e) is given by hg → g, and thus the fundamental vector field associated with b ∈ g is just the left-invariant vector field h → h * b ≈ (h, b). Given a vector X ∈ T M and its horizontal lift X h = h * η ≈ (h, η) at h ∈ H, we conclude that its lift to hg ∈ H is X hg ≈ (hg, P ad g −1 η ). This follows directly from the fact that X qg is the horizontal projection of (R q ) * X q . This can be seen also from a different perspective. Following [16] we can identify TM with the product bundle H × G (h/g), where G acts on h/g by ad g −1 . Identifying h/g with d we get that TM ≈ H × G d, where the G action is given by P • ad g −1 (the fact that this is indeed a G-action follows from the fact that ad G g ⊂ g = ker P ). Now taking horizontal vectors X ≈ (h, η) and Y ≈ (h, a) at q = h ∈ H = Q (here of course η, a ∈ d), and an element b ∈ g we easily get The first three formulas follow easily from the respective definitions in Section 4. We will prove the last, not so obvious, formula. Recall that BL( X), b = d dt t=0 L( X q·g(t) ), where g(t) is the flow of b. Now, since X q·g(t) = P (ad g(t) −1 η), we get Now, one can easily check that, under identifications (5.11a)-(5.11d) formula (4.8) becomes (5.3), condition (4.10) becomes (5.5), condition (4.11) becomes (5.6) and condition (4.12) becomes (5.7).
In the light of the above considerations, we may understand Theorem 4.8 as a generalization of Theorem 5.3 -we can substitute the Lie group H with a general manifold Q. On the other hand, the results of Theorem 5.3 apply even if the system on H is not a subject of the action of a closed Lie subgroup G ⊂ H (the completing subspace d ′ does not have to be a subalgebra). Thus it covers also situations beyond the reach of Theorem 4.8.
From a different perspective, we can treat left-invariant systems discussed in this remark as a particular class of examples of generalized Chaplygin systems, and thus understand Theorem 5.3 restricted to the systems discussed in this remark as a general example of the usage of Theorem 4.8.

Remark 5.8 (System on Lie groups as (non-invariant) Chaplygin systems -another viewpoint).
There is also another way of understanding the geometric situation described in the previous Remark 5.7. Namely, we can treat this system as a Chaplygin system on the left-principal G-bundle H = Q over the homogeneous space G\H of left quotients, equipped with the canonical left G-action (note that our considerations from Section 4 can be repeated also for left principal bundles, under condition that the adjectives "left" and "right" are carefully intertwined). In this situation, again the splitting d ⊕ g = h defines the splitting of TQ into its horizontal part HQ = H × d (which is now invariant with respect to the action) and another part H × g. Yet now, due to the fact that the canonical identification (Gh, h) ≈ (G, e) is given by gh → g, the fundamental vector field associated with an element b ∈ g is h → (R h ) * b = (L h ) * ad h −1 b. It follows that the canonical splitting of T h H into its horizontal and vertical part is (L h ) * d ⊕ (R h ) * g, and not (L h ) * d ⊕ (L h ) * g. Thus at each point h ∈ H we have the two different identifications of T h H with d ⊕ g. The passage between these two splittings is provided by Now given two horizontal vectors X = (L h ) * η and Y = (L h ) * a at h ∈ H (clearly a, η ∈ d) and an element b ∈ g one easily shows that We leave as an exercise to check that, under identifications (5.13a)-(5.13d) formula (4.8) is equivalent to (5.3), condition (4.10) to (5.5), condition (4.11) to (5.6) and condition (4.12) to (5.7), where the former are derived for pairs (a ′ , b ′ ) which are related with pairs (a, b) by (5.12).
We see that in this specific situation we were able to interpret a left-invariant system on a Lie group as a left-Chaplygin system, with invariant horizontal distribution (yet no requirements of the invariance of the Lagrangian were needed). The price that had to be payed for this invariance is the quite complicated passage between the two trivializations of T h H:

Examples
We shall end this paper by considering some well-known examples of nonholonomic systems with linear constraints. All these examples lie in the common setting of Sections 4 and 5 described in Remark 5.7. Thus they can be understood as a practical demonstration of our results form both Section 4 and 5. Consider a unicycle moving on the plane without slipping (see figure 1). Any configuration of this system is determined by the contact point (x, y) ∈ R 2 , an angle ϕ ∈ S 1 , which indicates the direction of movement, and an angle θ ∈ S 1 describing the rotation of the wheel. The configuration space is, hence, the Lie group H = SE(2) × S 1 with natural coordinates (x, y, ϕ, θ). We denote by m and R the mass and the radius of the wheel by I and J the inertia of the wheel with respect to ∂ θ -and ∂ ϕ -axes, respectively.
In fact, these are the left-invariant vector fields on H subject to the following commutation relations Thus {e 1 , e 2 , ∂ ϕ , ∂ θ } is a basis of the left Lie algebra h of the group H. Let now (α, β,φ,θ 1 ) be the fiber-wise coordinates in TH with respect to this basis. One easily shows that α =ẋ cos ϕ +ẏ sin ϕ and β =ẏ cos ϕ −ẋ sin ϕ. Therefore the Lagrangian reads Since it depends on the h-coefficients only, we conclude that L is a left-H-invariant function. The constrained distribution is characterized by equations α = Rθ and β = 0 and thus it is spanned by fields e 1 + 1 R ∂ θ and ∂ ϕ . We clearly see that we are in the situation described in Section 5, of a system on the Lie group H with the left-H-invariant constrained distribution corresponding to the subspace d = span{e 1 + 1 R ∂ θ , ∂ ϕ } ⊂ h. The natural choice of the completing subspace is d ′ = span{e 1 , e 2 } ⊂ h, which is not only a subspace, but also an abelian subalgebra of h corresponding to a connected abelian subgroup of translations G := R 2 ⊂ SE (2). For this reason we shall denote d ′ = g.
Within this choice of the completing subspace we are, in fact, in the common setting of Sections 4 and 5 described in Remark 5.7. We can now apply the methods developed in these sections to study the comparison problems for this system.
We may further ask which nonholonomic extremals are unconstrained ones. A short calculation (using the commutation relations and the form of ∂L ∂η ) shows that condition (5.5) is equivalent tȯ α = 0 and αφ = 0.
To check if these equations are satisfied we need to derive the nonholonomic equations of motion (5.8).
Again skipping some simple calculations we arrive aṫ We conclude that both α andφ are constants of motion. Moreover, a nonholonomic extremal is an unconstrained one if and only if at least one of these constants vanishes.
With a little more effort one can determine the vakonomic multipliers λ(t) for the system in question. The general form of any One can easily show that equations (5.10) read asḟ = gφ + mα andġ =φ(mα − f ).
Finally note that we can modify the Lagrangian L by adding any potential term U (h) invariant in Gdirections without changing the answer to question (Q1) and without changing the equation for a vakonomic multiplier. Indeed in this situation h(t) * ∂L ∂h , b = 0 for any b ∈ g and thus the additional term will not alter the left-hand side of formulas (5.6) and (5.10). However, the nonholonomic equation of motion could be affected by such an addition.
Let us summarize our considerations.

Conclusion.
The unicycle answers positively question (Q1). In such a situation nonholonomic trajectories are characterized by equations α = const andφ = const. A nonholonomic trajectory is an unconstrained one if and only if at least one of these constants vanishes.
The positive answer to (Q1) will not be affected by adding a G-invariant potential term to the Lagrangian.

Example 6.2. The two-wheeled carriage
This example of a nonholonomically constrained system is introduced following [7,6]. It is a natural generalization of the system studied in the previous paragraph. Again it is a system on a Lie group with a left-invariant constraints distribution and, by choosing a proper completing distribution, we can consider it in the common setting of Sections 4 and 5 as discussed in Remark 5.7. Consider a two-wheeled carriage mowing without slipping on the plane (see figure 2). The position of the carriage is determined by an element ((x, y, ϕ), ψ 1 , ψ 2 ) of the Lie group H = SE(2) × S 1 × S 1 . Here (x, y) is the position of the center of mass, angle ϕ describes the orientation of the axis, while angles ψ 1 and ψ 2 specify the rotation of the wheels. Parameters of the system include its total mass m = m 0 + 2m 1 , being the sum of the mas of the carriage m 0 and two masses of the wheels m 1 ; the distance between the center of mass and the axis l; the inertia of the whole system with respect to the ∂ ϕ -axis J, the inertia of each wheel I, the distance between the wheels 2w and the radius of each wheel R.
It is a left-H-invariant function, since there is no dependence on the base coordinates. The constraints are characterized by equations α = Rθ 1 , β = 0 and wφ = Rθ 2 . Thus the constraints distribution is a left-H-invariant distribution corresponding to the linear subspace d = span{e 1 + 1 R ∂ θ 1 , ∂ ϕ + w R ∂ θ 2 } ⊂ h. By choosing the completing subspace d ′ = g = span{e 1 , e 2 , ∂ φ } ⊂ h (which is a subalgebra corresponding to a connected subgroup G = SE(2) ⊂ H), we position ourselves in the situation described in Remark 5.7. Therefore we may apply the methods of Section 5 to study the comparison problems for the system.
We clearly see that if l = 0 orφ = 0 then the above expression vanishes identically, and thus condition (5.6) is satisfied. In this way we repeat the observation (made already in [2,6,7,8]) that for l = 0 the two-wheeled carriage answers (Q1) positively. Dealing with the case l = 0 requires much more attention, namely, we will need the information provided in the nonholonomic equation of motion (5.8) and in the equation for vakonomic admissible variations (5.3). A short computation of (5.8) involving the commutation relations shows that the nonholonomic extremals satisfyα = X(φ) 2 (6.2a)φ = −Y αφ, (6.2b) where X = m 0 lR 2 mR 2 +2I and Y = m 0 lR 2 JR 2 +2Iw 2 depend on the parameters of the system. On the other hand, the pair a(t) = E(t)(e 1 + 1 R ∂ θ 1 ) + F (t)(∂ ϕ + w R ∂ θ 2 ) ∈ d and b(t) = A(t)e 1 + B(t)e 2 + C(t)∂ ϕ ∈ g satisfies In condition (5.6) we need to restrict our attention to curves vanishing at the end-points, and thus C(t) ≡ 0. Now the above system becomesȦ =φB (6.3a)Ḃ = −φA + (αF −φE). (6.3b) In the light of condition (5.6) and equation (6.1), a nonholonomic trajectory (for a carriage with l = 0) satisfying (6.2a)-(6.2b) is a vakonomic one if and only if Iφ (αF −φE)dt = 0 for every (A, B, C, E, F ) satisfying (6.3a)-(6.3b) and vanishing at the end-points. Assume that we have such a solution and let us calculate (in the integration by parts we use the fact that A and B vanish at the end-points) Iφ (αF −φE)dt We conclude that if XY = 1 then every nonholonomic trajectory is a vakonomic one, thus our system answers (Q1) positively. Condition XY = 1 describes precisely the special configuration of the system found by a different method by Crampin and Mestdag [6]. Let us end the discussion of this example by studying the vakonomic multiplier λ(t). The general form of every λ(t) ∈ Ann(d) is λ(t) = f (t)(e * 1 − Rdθ 1 ) + g(t)(dϕ − R w dθ 2 ) + h(t)e * 2 . Equation (5.10) gives the following set of equations defining the coefficients f , g and h: f =mα − m 0 lφ 2 +φḣ g =Jφ + m 0 lφα − αḣ h =m 0 lφ + mαφ −φf.
Using this equations we may characterize these nonholonomic trajectories which are simultaneously uncon- Also the equation (5.10) defining the vakonomic multiplier λ(t) ∈ Ann(d) ⊂ h * (such λ(t) has to be of the form f (t)(dz − ydx + xdy)) has a particularly simple form. Namely, We can now summarize our considerations.
Conclusion. The Heisenberg system answers positively question (Q1) . In fact, every nonholonomic extremal is also an extremal of the unconstrained problem.
Remark 6.4. In Proposition 3(5) in [8] Fernandez and Bloch claimed that every nonholonomic system defined by a 2-distribution D on a 3-manifold M cannot be conditionally variational (i.e., the set of nonholonomic trajectories of this system cannot be a subset of the set of vakonomic trajectories) unless D is integrable. The above Example 6.3 contradicts this claim. This can be also seen directly at the level of the equations of motion. Indeed, we derived above that these areα =β = 0, i.e.,ẍ =ÿ = 0. Noẅ z = xÿ − yẍ = 0 and hence each nonholonomic trajectory is a line respecting the constraints. It is a student exercise to check that such a curve is also an extremal of the unconstrained problem. By Proposition 3.7 it also a vakonomic extremal. Actually Fernandez and Bloch proved the following fact: if a nonholonomic trajectory γ of a system defined by a 2-distribution D on a 3-manifold M is a vakonomic one with a Lagrange multiplier µ, then either µ ≡ 0 or D is integrable along γ. Hence their Proposition 3(5) should be correctly restated as follows:

Proposition 6.5. Consider a system given by a 2-distribution D on a 3-manifold M . Then every nonholonomic extremal is a vakonomic one if and only if it is also an unconstrained one.
Example 6.6. The generalized Heisenberg system We can generalize previous Example 6.3 by considering Lagrangians of the following form L(x, y, z, α, β, γ) = f (x, y)α 2 + g(x, y)αβ + h(x, y)β 2 + Φ(x, y, z)γ 2 − U (x, y).
In this case L is invariant in the g-direction along the constraints distribution γ = 0 and thus on every nonholonomic trajectory h * ∂L ∂h , b = 0 for any b ∈ g. Moreover on the constraints distribution ∂L ∂η = (2f (x, y)α + g(x, y)β) dx + (2h(x, y)β + g(x, y)α)dy ∈ Ann(g), so this covector annihilates g, [d, g] h = 0 and [d, d] h ⊂ g. We clearly see that conditions (5.6) and (5.5) are trivially satisfied in this situation. Note that we came to these conclusions without the necessity of deriving the nonholonomic equation of motion, which in this case are quite complicated.
Conclusion. The generalized Heisenberg system answers positively question (Q1). In fact, every nonholonomic extremal is also an extremal of the unconstrained problem.

Conclusions
In this paper we studied the problem of comparison of nonholonomic and vakonomic constrained Lagrangian dynamics for the same set of constraints. Our approach was based on an observation (made already by several researchers [4,10,13,14,19] and rooted in a general philosophy of Tulczyjew [24]) that nonholonomically and vakonomically constrained Lagrangian systems can be put into the frames of the same unifying variational formalism (called in this paper a variational principle). The differences in these two systems appear at the level of admissible variations. The new idea is to concentrate solely on these differences, completely ignoring the resulting differences at the level of the equations of motion. In fact, we understand the equations of motion as secondary objects: the consequences of the underlying variational principle, not the fundamental description of the system. Such a point of view results in simplicity, since admissible variations have a much simpler description, and clearer geometric nature, then the resulting equations. Moreover, concentrating on the variations we can easily relate the problem of comparison of both dynamics with the symmetries of the system. As a particular realization of this strategy we studied (generalized) Chaplygin systems and left-invariant systems on Lie groups. Using our approach we were able to substantially generalize many classical results for such systems.