Residual networks (ResNets) have displayed impressive results in pattern recog nition and\, recently\, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural OD Es). This link relies on the convergence of networ k weights to a smooth function as the number of la yers increases. We investigate the properties of w eights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those ass umed in neural ODE literature: one may obtain an a lternative ODE limit\, a stochastic differential e quation or neither of these. The scaling regime on e ends up with depends on certain features of the network architecture\, such as the smoothness of t he activation function. These findings cast doubts on the validity of the neural ODE model as an ade quate asymptotic description of deep ResNets and p oint to an alternative class of differential equat ions as a better description of the deep network l imit. \; In the case where the scaling limit i s a stochastic differential equation\, the deep ne twork limit is shown to be described by a system o f forward-backward stochastic differential equatio ns. Joint work with: Alain-Sam Cohen (InstaDeep Lt d)\, Alain Rossier (Oxford)\, RenYuan Xu (Universi ty of Southern California).

LOCATION:Discussion Room\, Newton Institute CONTACT: END:VEVENT END:VCALENDAR