\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Investigation of reinforcement learning for shape optimization of 2D profile extrusion die geometries

  • *Corresponding author: Daniel Wolff

    *Corresponding author: Daniel Wolff 
Abstract / Introduction Full Text(HTML) Figure(21) / Table(6) Related Papers Cited by
  • Profile extrusion is a continuous production process for manufacturing plastic profiles from molten polymer. Especially interesting is the design of the die, through which the melt is pressed to attain the desired shape. However, due to an inhomogeneous velocity distribution at the die exit or residual stresses inside the extrudate, the final shape of the manufactured part often deviates from the desired one. To avoid these deviations, the shape of the die can be computationally optimized, which has already been investigated in the literature using classical optimization approaches [7,32,47].

    A new approach in the field of shape optimization is the utilization of RL as a learning-based optimization algorithm. RL is based on trial-and-error interactions of an agent with an environment. For each action, the agent is rewarded and informed about the subsequent state of the environment. While not necessarily superior to classical, e.g., gradient-based or evolutionary, optimization algorithms for one single problem, RL techniques are expected to perform especially well when similar optimization tasks are repeated since the agent learns a more general strategy for generating optimal shapes instead of concentrating on just one single problem.

    In this work, we investigate this approach by applying it to two 2D test cases. The flow-channel geometry can be modified by the RL agent using so-called FFD [34], a method where the computational mesh is embedded into a transformation spline, which is then manipulated based on the control-point positions. In particular, we investigate the impact of utilizing different agents on the training progress and the potential of wall time saving by utilizing multiple environments during training.

    Mathematics Subject Classification: Primary: 49Q10, 76D55; Secondary: 35Q30.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Interaction loop between an agent and an environment during training. In each interaction / training step $ t $, the agent selects an action $ a_t $ according to a policy $ \pi $ based on observations of the current environment's state $ s_t $ and a numerical reward signal $ r_t $. This changes the state of the environment and generates the new observation and the new reward for the next step

    Figure 2.  Limited taxonomy of RL agents. Blue items represent categories of agents. Turquoise items are agents trained with the off-policy method. Magenta items are agents trained with the onpolicy method

    Figure 3.  Visualization of the RL interaction between an agent and the environment in our $ \texttt{releso} $ framework. The environment has been customized to our shape optimization problem. It comprises the base mesh and the deformation spline used for the geometry parameterization, a FFD module deforming the mesh, the solver computing the governing PDE problem, and a component for postprocessing the simulation results to determine the reward and an observation of the CFD environment for the agent. The arrows correspond to the information flows between the different components: Green arrows represent meshes, yellow arrows spline parameterizations, and red arrow stands for the simulation results. Based on the provided reward and observation, the agent chooses an action to modify the deformation spline of the FFD

    Figure 4.  Visualization of the key idea of FFD: An initial geometry (A) is transformed by modifying the control points (dark-blue dots) of a transformation spline (highlighted in light-blue) to obtain a deformed geometry (B)

    Figure 5.  Geometry and boundaries of the T-shaped geometry

    Figure 6.  (A) shows the deformation spline used for the parameterization of the T-shaped geometry. To make the FFD more generic, the actual geometry is scaled to the parametric space of the transformation spline before applying geometric modifications. Additionally, the possible movement of the control points is illustrated by the orange arrows in (B)

    Figure 7.  Comparison of different algorithms trained to optimize the T-shaped geometry following a direct strategy with respect to the episode reward over the trained steps. Each run was repeated twice as indicated using the same color

    Figure 8.  Comparison of different algorithms trained to optimize the T-shaped geometry following an incremental strategy with respect to the episode reward over the trained steps. Each run was repeated twice as indicated using the same color

    Figure 9.  Examples of optimal geometries obtained by a PPO agent following an incremental optimization strategy on the T-shaped geometry. One can see that the trained agent has learned a valid strategy which involves modifying the control points that strongly influence the cross-sectional area of the two outflows

    Figure 10.  Episode reward over training steps of a PPO agent following an incremental strategy for optimizing the T-shaped geometry when interacting with different numbers of environments

    Figure 11.  Episode reward over wall time of a PPO agent following an incremental strategy for optimizing the T-shaped geometry when interacting with different numbers of environments

    Figure 12.  Geometry and boundaries of the converging channel geometry

    Figure 13.  Deformation spline used for the parameterization of the channel geometry. Additionally, the possible movement of the control points is illustrated by the orange arrows

    Figure 14.  Outflow boundary of the converging channel geometry. The outflow is divided into three patches $ \Gamma_{\text{out}, i} $, for which the flow-homogeneity criterion is evaluated

    Figure 15.  Comparison of different agents trained to optimize the converging channel geometry following a direct strategy with respect to the episode reward over the trained steps. Each run was repeated twice as indicated using the same color

    Figure 16.  Comparison of different agents trained to optimize the converging channel geometry following an incremental strategy with respect to the episode reward over the trained steps. Each run was repeated twice as indicated using the same color

    Figure 17.  Examples of optimal geometries obtained by a PPO agent following an incremental optimization strategy for the converging channel geometry. Each of the shown geometries has been generated from a random initial geometry generated by a random perturbation of the control points. Qualitatively the agent has learned to contract the channel as smoothly as possible to assert an optimal flow homogeneity

    Figure 18.  Comparison of different agents trained to optimize the converging channel geometry following a direct strategy with respect to the steps per episode over the trained steps. Each run was repeated twice as indicated using the same color

    Figure 19.  Comparison of different agents trained to optimize the converging channel geometry following an incremental strategy with respect to the steps per episode over the trained steps. Each run was repeated twice as indicated using the same color

    Figure 20.  Episode reward over training steps of a PPO agent following an incremental strategy for optimizing the converging channel geometry when interacting with different numbers of environments

    Figure 21.  Episode reward over wall time of a PPO agent following an incremental strategy for optimizing the converging channel geometry when interacting with different numbers of environments

    Table 1.  Compatibility of the agents implemented in $ \texttt{stable-baselines3} $ with regard to the direct and incremental shape optimization method

    Agent Incremental Direct
    PPO $\checkmark$ $\checkmark$
    DQN $\checkmark$ -
    SAC - $\checkmark$
    A2C $\checkmark$ $\checkmark$
    DDPG - $\checkmark$
     | Show Table
    DownLoad: CSV

    Table 2.  Material properties of the shear-thinning material law for all test cases

    Property Symbol Value Unit
    zero-shear viscosity $ A $ 10935 kg m−1 s−1
    reciprocal transition rate $ B $ 0.433 s−1
    slope of viscosity curve in pseudoplastic region $ C $ 0.699 -
     | Show Table
    DownLoad: CSV

    Table 3.  Wall-clock times of the agents trained with the direct optimization method to optimize the T-shaped geometry

    Agent Max. training time
    PPO 26.1 h
    A2C 46.5 h
    SAC 26.0 h
    DDPG 33.5 h
     | Show Table
    DownLoad: CSV

    Table 4.  Wall-clock times of the agents trained with the incremental optimization method for the T-shaped geometry use case.

    Agent Max. training time
    PPO 49.9 h
    A2C 45.1 h
    DQN 45.0 h
     | Show Table
    DownLoad: CSV

    Table 5.  Wall-clock times of the agents trained with the direct optimization method for the converging channel use case

    Agent Training time
    PPO 64.2 h
    A2C 44.0 h
    SAC 70.0 h
    DDPG 45.0 h
     | Show Table
    DownLoad: CSV

    Table 6.  Wall-clock times of the agents trained with the incremental optimization method for the converging channel use case

    Agent Training time
    PPO 42.2 h
    A2C 43.1 h
    DQN 44.7 h
     | Show Table
    DownLoad: CSV
  • [1] K. ArulkumaranM. P. DeisenrothM. Brundage and A. A. Bharath, Deep reinforcement learning: A brief survey, IEEE Signal Processing Magazine, 34 (2017), 26-38.  doi: 10.1109/MSP.2017.2743240.
    [2] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, OpenAI Gym, Computer Science, (2016), 1-4, http://arXiv.org/abs/1606.01540.
    [3] L. BuşoniuT. de BruinD. TolićJ. Kober and I. Palunko, Reinforcement learning for control: Performance, stability, and deep approximators, Annual Reviews in Control, 46 (2018), 8-28.  doi: 10.1016/j.arcontrol.2018.09.005.
    [4] P. J. Carreau, Rheological equations from molecular network theories, Transactions of the Society of Rheology, 16 (1972), 99-127.  doi: 10.1122/1.549276.
    [5] J. A. Cottrell, T. J. R. Hughes and Y. Bazilevs, Isogeometric Analysis, John Wiley & Sons, Ltd, Chichester, UK, 2009. doi: 10.1002/9780470749081.
    [6] F. DworschakS. DietzeM. WittmannB. Schleich and S. Wartzack, Reinforcement learning for engineering design automation, Advanced Engineering Informatics, 52 (2022), 101612.  doi: 10.1016/j.aei.2022.101612.
    [7] S. ElgetiM. ProbstC. WindeckM. BehrW. Michaeli and C. Hopmann, Numerical shape optimization as an approach to extrusion die design, Finite Elements in Analysis and Design, 61 (2012), 35-43.  doi: 10.1016/j.finel.2012.06.008.
    [8] P. Garnier, J. Viquerat, J. Rabault, A. Larcher, A. Kuhnle and E. Hachem, A review on deep reinforcement learning for fluid mechanics, Computers and Fluids, 225 (2021), 104973, 13 pp. doi: 10.1016/j.compfluid.2021.104973.
    [9] H. GhraiebJ. ViqueratA. LarcherP. Meliga and E. Hachem, Single-step deep reinforcement learning for two- and three-dimensional optimal shape design, AIP Advances, 12 (2022), 085108.  doi: 10.1063/5.0097241.
    [10] T. HaarnojaA. ZhouP. Abbeel and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 35th International Conference on Machine Learning, ICML 2018, 5 (2018), 2976-2989. 
    [11] C. Hopmann and W. Michaeli, Extrusion Dies for Plastics and Rubber, 4th edition, Carl Hanser Verlag GmbH & Co. KG, München, 2016. doi: 10.3139/9781569906248.
    [12] L. P. Kaelbling, M. L. Littman and A. W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research, 4 (1996), 237-285, https://dl.acm.org/doi/10.5555/1622737.1622748, http://arXiv.org/abs/cs/9605103. doi: 10.1613/jair.301.
    [13] J. Kober and J. Peters, Reinforcement Learning in Robotics: A Survey, Learning Motor Skills, (2014), 9-67. doi: 10.1007/978-3-319-03194-1_2.
    [14] V. R. Konda and J. N. Tsitsiklis, Actor-critic algorithms, Advances in Neural Information Processing Systems, 1008-1014.
    [15] A. Lampton, A. Niksch and J. Valasek, Morphing airfoils with four morphing parameters, AIAA Guidance, Navigation and Control Conference and Exhibit, (2012), 21 pp. doi: 10.2514/6.2008-7282.
    [16] J. Lee, guastaf, https://github.com/tataratat/gustaf.
    [17] R. LiY. Zhang and H. Chen, Learning the aerodynamic design of supercritical airfoils through deep reinforcement learning, AIAA Journal, 59 (2021), 3988-4001.  doi: 10.2514/1.J060189.
    [18] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings.
    [19] W. Michaeli, S. Kaul and T. Wolff, Computer-aided optimization of extrusion dies, Journal of Polymer Engineering, 21. doi: 10.1515/POLYENG.2001.21.2-3.225.
    [20] V. MnihA. P. BadiaL. MirzaA. GravesT. HarleyT. P. LillicrapD. Silver and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, 33rd International Conference on Machine Learning, ICML 2016, 4 (2016), 2850-2869. 
    [21] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, Playing atari with deep reinforcement learning, Computer Science, (2013), 1-9, http://arXiv.org/abs/1312.5602.
    [22] J. M. NóbregaO. S. CarneiroF. T. Pinho and P. J. Oliveira, Flow balancing in extrusion dies for thermoplastic profiles, Part III: Experimental assessment, International Polymer Processing, 19 (2004), 225-235. 
    [23] T. Osswald and N. Rudolph, Polymer Rheology, Carl Hanser Verlag GmbH & Co. KG, 2015.
    [24] L. PauliM. Behr and S. Elgeti, Towards shape optimization of profile extrusion dies with respect to homogeneous die swell, Journal of Non-Newtonian Fluid Mechanics, 200 (2013), 79-87.  doi: 10.1016/j.jnnfm.2012.12.002.
    [25] L. Piegl and W. Tiller, The NURBS Book, Monographs in Visual Communications, Springer Berlin Heidelberg, Berlin, Heidelberg, 1995. doi: 10.1007/978-3-642-97385-7.
    [26] J. F. Pittman, Computer-aided design and optimization of profile extrusion dies for thermoplastics and rubber: A review, Proceedings of the Institution of Mechanical Engineers, Part E: Journal of Process Mechanical Engineering, 225 (2011), 280-321.  doi: 10.1177/0954408911415324.
    [27] S. QinS. WangL. WangC. WangG. Sun and Y. Zhong, Multi-objective optimization of cascade blade profile based on reinforcement learning, Applied Sciences (Switzerland), 11 (2021), 1-27.  doi: 10.3934/ipi.2021045.
    [28] J. Rabault and A. Kuhnle, Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach, Physics of Fluids, 31 (2019), 094105.  doi: 10.1063/1.5116415.
    [29] J. RabaultF. RenW. ZhangH. Tang and H. Xu, Deep reinforcement learning in fluid mechanics: A promising method for both active flow control and shape optimization, Journal of Hydrodynamics, 32 (2020), 234-246.  doi: 10.1007/s42241-020-0028-y.
    [30] A. RaffinA. HillA. GleaveA. KanervistoM. Ernestus and N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research, 22 (2021), 1-8. 
    [31] A. RajkumarL. L. FerrásC. FernandesO. S. CarneiroM. Becker and J. M. Nóbrega, Design guidelines to balance the flow distribution in complex profile extrusion dies, International Polymer Processing, 32 (2017), 58-71.  doi: 10.3139/217.3272.
    [32] A. RajkumarL. L. FerrásC. FernandesO. S. Carneiro and J. M. Nóbrega, Guidelines for balancing the flow in extrusion dies: The influence of the material rheology, Journal of Polymer Engineering, 38 (2018), 197-211.  doi: 10.1515/polyeng-2016-0449.
    [33] J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, CoRR, (2017), 1-12, http://arXiv.org/abs/1707.06347.
    [34] T. W. Sederberg and S. R. Parry, Free-form deformation of solid geometric models, Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1986, 20 (1986), 151-160.  doi: 10.1145/15922.15903.
    [35] R. SiegbertJ. KitschkeH. DjelassiM. Behr and S. Elgeti, Comparing optimization algorithms for shape optimization of extrusion dies, PAMM, 14 (2014), 789-794.  doi: 10.1002/pamm.201410377.
    [36] D. SilverT. HubertJ. SchrittwieserI. AntonoglouM. LaiA. GuezM. LanctotL. SifreD. KumaranT. GraepelT. LillicrapK. Simonyan and D. Hassabis, A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science, 362 (2018), 1140-1144.  doi: 10.1126/science.aar6404.
    [37] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd edition, The MIT Press, 2018, http://incompleteideas.net/book/the-book-2nd.html.
    [38] I. SzarvasyJ. SienzJ. F. T. Pittman and E. Hinton, Computer aided optimisation of profile extrusion dies, International Polymer Processing, 15 (2000), 28-39.  doi: 10.3139/217.1577.
    [39] T. E. TezduyarJ. Liou and M. Behr, A new strategy for finite element computations involving moving boundaries and interfaces-the DSD/ST procedure. I. The concept and the preliminary numerical tests, Computer Methods in Applied Mechanics and Engineering, 94 (1992), 339-351.  doi: 10.1016/0045-7825(92)90059-S.
    [40] O. VinyalsI. BabuschkinW. M. CzarneckiM. MathieuA. DudzikJ. ChungD. H. ChoiR. PowellT. EwaldsP. GeorgievJ. OhD. HorganM. KroissI. DanihelkaA. HuangL. SifreT. CaiJ. P. AgapiouM. JaderbergA. S. VezhnevetsR. LeblondT. PohlenV. DalibardD. BuddenY. SulskyJ. MolloyT. L. PaineC. GulcehreZ. WangT. PfaffY. WuR. RingD. YogatamaD. WünschK. McKinneyO. SmithT. SchaulT. LillicrapK. KavukcuogluD. HassabisC. Apps and D. Silver, Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, 575 (2019), 350-354.  doi: 10.1038/s41586-019-1724-z.
    [41] J. Viquerat, R. Duvigneau, P. Meliga, A. Kuhnle and E. Hachem, Policy-based optimization: Single-step policy gradient method seen as an evolution strategy, Neural Computing and Applications, 35 (2023), 449–467, http://arXiv.org/abs/2104.06175. doi: 10.1007/s00521-022-07779-0.
    [42] J. Viquerat, P. Meliga and E. Hachem, A review on deep reinforcement learning for fluid mechanics: An update, Computational Physics, (2022), http://arXiv.org/abs/2107.12206.
    [43] J. Viquerat, J. Rabault, A. Kuhnle, H. Ghraieb, A. Larcher and E. Hachem, Direct shape optimization through deep reinforcement learning, Journal of Computational Physics, 428 (2021), 110080, 12 pp. doi: 10.1016/j.jcp.2020.110080.
    [44] D. Wolff, C. D. Fricke, M. Kemmerling and S. Elgeti, [WIP] Towards shape optimization of flow channels in profile extrusion dies using reinforcement learning, Proceedings in Applied Mathematics and Mechanics, 22.
    [45] X. YanJ. ZhuM. Kuang and X. Wang, Aerodynamic shape optimization using a novel optimizer based on machine learning techniques, Aerospace Science and Technology, 86 (2019), 826-835.  doi: 10.1016/j.ast.2019.02.003.
    [46] O. YilmazH. Gunes and K. Kirkkopru, Optimization of a profile extrusion die for flow balance, Fibers and Polymers, 15 (2014), 753-761.  doi: 10.1007/s12221-014-0753-3.
    [47] G. ZhangX. HuangS. Li and T. Deng, Optimized design method for profile extrusion die based on NURBS modeling, Fibers and Polymers, 20 (2019), 1733-1741.  doi: 10.1007/s12221-019-1168-y.
  • 加载中

Figures(21)

Tables(6)

SHARE

Article Metrics

HTML views(2125) PDF downloads(424) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return