From 3967903041e790fd15db19cc6e71cb0b237c1f34 Mon Sep 17 00:00:00 2001 From: baggepinnen <cont-frb@ulund.org> Date: Mon, 7 May 2018 10:00:01 +0200 Subject: [PATCH] add forgotten --- jump_lin_id/id_paper.tex | 29 +++---- jump_lin_id/pres/pres_idpaper.tex | 126 ++++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+), 14 deletions(-) create mode 100644 jump_lin_id/pres/pres_idpaper.tex diff --git a/jump_lin_id/id_paper.tex b/jump_lin_id/id_paper.tex index 8c740e4..ae89c0a 100644 --- a/jump_lin_id/id_paper.tex +++ b/jump_lin_id/id_paper.tex @@ -1,4 +1,5 @@ -% \documentclass[a4paper, 10 pt]{article} \usepackage{iclr2018_conference,times} +% \documentclass[a4paper, 10 pt]{article} \usepackage{iclr2018_conference,times,amsthm} +% \documentclass[a4paper, 10 pt]{article} \usepackage{geometry,amsthm} \documentclass[letterpaper, 10 pt,conference]{ieeeconf}\IEEEoverridecommandlockouts %\overrideIEEEmargins % \pdfminorversion=4 @@ -164,9 +165,9 @@ Identification of LTV Dynamical Models with\\ Smooth or Discontinuous Time Evolu The difficulty of the task of identifying time-varying dynamical models of systems varies greatly with the model considered and the availability of measurements of the state sequence. For smoothly changing dynamics, linear in the parameters, the recursive least-squares algorithm with exponential forgetting (RLS$\lambda$) is a common option. If a Gaussian random-walk model for the parameters is assumed, a Kalman filtering/smoothing algorithm \cite{rauch1965maximum} gives the filtering/smoothing densities of the parameters in closed form. The assumption of smoothly (Gaussian) varying dynamics is often restrictive. Discontinuous dynamics changes occur, for instance, when an external controller changes operation mode, when a sudden contact between a robot and its environment is established, an unmodeled disturbance enters the system or when a system is suddenly damaged. -Identification of systems with non-smooth dynamics evolution has been studied extensively. The book~\cite{costa2006discrete} treats the case where the dynamics are known, but the state sequence unknown, i.e., state estimation. In~\cite{nagarajaiah2004time}, the authors examine the residuals from an initial constant dynamics fit to determine regions in time where improved fit is needed by the introduction of additional constant dynamics models. Results on identifiability and observability in jump-linear systems in the non-controlled (autonomous) setting are available in~\cite{vidal2002observability}. The main result on identifiability in \cite{vidal2002observability} was a rank condition on a Hankel matrix constructed from the collected output data, similar to classical results on the least-squares identification of ARX models which appears as rank constraints on the, typically Toeplitz or block-Toeplitz, regressor matrix. Identifiability of the methods proposed in this article are discussed in~\cref{sec:identifiability}. +Identification of systems with non-smooth dynamics evolution has been studied extensively. The book~\cite{costa2006discrete} treats the case where the dynamics are known, but the state sequence unknown, i.e., state estimation. In~\cite{nagarajaiah2004time}, the authors examine the residuals from an initial constant dynamics fit to determine regions in time where improved fit is needed by the introduction of additional constant dynamics models. Results on identifiability and observability in jump-linear systems in the non-controlled (autonomous) setting are available in~\cite{vidal2002observability}. The main result on identifiability in \cite{vidal2002observability} was a rank condition on a Hankel matrix constructed from the collected output data, similar to classical results on the least-squares identification of ARX models which appears as rank constraints on the, typically Toeplitz or block-Toeplitz, regressor matrix. Identifiability of the methods proposed in this article is discussed in~\cref{sec:identifiability}. -An LTV model can be seen as a first-order approximation of the dynamics of a nonlinear system around a trajectory. We emphasize that such an approximation will in general fail to generalize far from this trajectory, but many methods in reinforcement learning and control make efficient use of the linearized dynamics for optimization, while ensuring validity of the approximation by constraints or penalty terms. An example provided in \cref{sec:rl} highlights such a method. +An LTV model can be seen as a first-order approximation of the dynamics of a nonlinear system around a trajectory. We emphasize that such an approximation will, in general, fail to generalize far from this trajectory, but many methods in reinforcement learning and control make efficient use of the linearized dynamics for optimization, while ensuring validity of the approximation by constraints or penalty terms. An example provided in \cref{sec:rl} highlights such a method. An important class of identification methods that has been popularized lately is \emph{trend filtering} methods~\cite{kim2009ell_1, tibshirani2014adaptive}. Trend filtering methods work by specifying a \emph{fitness criterion} that determines the goodness of fit, as well as a \emph{regularization} term, often chosen with sparsity promoting qualities. As a simple example, consider the reconstruction $\hat y$ of a noisy signal $y = \{y_t\inspace{}\}_{t=1}^T$ with piecewise constant segments. To this end, we may formulate and solve the convex optimization problem \begin{equation} \label{eq:tf} @@ -206,7 +207,7 @@ where $\otimes$ denotes the Kronecker product and $K=n^2+nm$ is the number of mo \section{Time-varying dynamics} -We now move on to the contribution of this work, and extend our view to systems where the dynamics change with time. We limit the scope of this article to models on the form +We now move on to the contribution of this work and extend our view to systems where the dynamics change with time. We limit the scope of this article to models on the form \begin{equation} \label{eq:tvk} \begin{split} @@ -228,7 +229,7 @@ We emphasize here that the state in the parameter evolution model refers to the The following sections will introduce a number of optimization problems with different regularization functions, corresponding to different choices of $p_w$, and different regularization arguments, corresponding to different choices of $H$. We also discuss the quality of the identification resulting from the different modeling choices. -\subsection{Low frequency time evolution} +\subsection{Low-frequency time evolution} A slowly varying signal is characterized by \emph{small first-order time differences}. To identify slowly varying dynamics parameters, we thus penalize the squared 2-norm of the first-order time difference of the model parameters, and solve the optimization problem \begin{equation} \label{eq:slow} @@ -279,7 +280,7 @@ If the number of switches in dynamics parameters, $M$, is known in advance, the & \subjto & & \sum_t \textbf{1}\{ \w_{t+1} \neq \w_t\} \leq M \end{align} where $\textbf{1}\{\cdot\}$ is the indicator function. -This problem is non-convex and we propose solving it using dynamic programming (DP). For this purpose we modify the algorithm developed in \cite{bellman1961approximation}, an algorithm frequently referred to as segmented least-squares~\cite{bellman1969curve}, where a curve is approximated by piecewise linear segments. The modification lies in the association of each segment (set of consecutive time indices during which the parameters are constant) with a dynamics model, as opposed to a simple straight line.\footnote{Indeed, if a simple integrator is chosen as dynamics model and a constant input is assumed, the result of our extended algorithm reduces to the segmented least-squares solution.} Unfortunately, the computational complexity of the dynamic programming solution, $\mathcal{O}(T^2K^3)$, becomes prohibitive for large $T$.\footnote{For details regarding the DP algorithm and implementation, the reader is referred to the source-code repository accompanying this article.} +This problem is non-convex and we propose solving it using dynamic programming (DP). For this purpose, we modify the algorithm developed in \cite{bellman1961approximation}, an algorithm frequently referred to as segmented least-squares~\cite{bellman1969curve}, where a curve is approximated by piecewise linear segments. The modification lies in the association of each segment (set of consecutive time indices during which the parameters are constant) with a dynamics model, as opposed to a simple straight line.\footnote{Indeed, if a simple integrator is chosen as dynamics model and a constant input is assumed, the result of our extended algorithm reduces to the segmented least-squares solution.} Unfortunately, the computational complexity of the dynamic programming solution, $\mathcal{O}(T^2K^3)$, becomes prohibitive for large $T$.\footnote{For details regarding the DP algorithm and implementation, the reader is referred to the source-code repository accompanying this article.} \subsection{Piecewise linear time evolution}\label{sec:pwlinear} A piecewise linear signal is characterized by a \emph{sparse second-order time difference}, i.e., it has a small number of changes in the slope. A piecewise linear time-evolution of the dynamics parameters is hence obtained if we solve the optimization problem. @@ -307,7 +308,7 @@ Norm & $D_n$ & Result \\ \midrule \subsection{Two-step refinement} -Since many of the proposed formulations of the optimization problem penalize the size of the changes to the parameters, solutions in which the changes are slightly underestimated are favored. To mitigate this issue, a two-step procedure can be implemented where in the first step, change points (knots) are identified. In the second step, the penalty on the one-norm is removed and equality constraints are introduced between consecutive time-indices for which no change in dynamics was indicated by the first step. +Since many of the proposed formulations of the optimization problem penalize the size of the changes to the parameters, solutions in which the changes are slightly underestimated are favored. To mitigate this issue, a two-step procedure can be implemented wherein the first step, change points (knots) are identified. In the second step, the penalty on the one-norm is removed and equality constraints are introduced between consecutive time-indices for which no change in dynamics was indicated by the first step. The second step can be computed very efficiently by noticing that the problem can be split into several identical sub-problems at the knots identified in the first step. The sub-problems have closed-form solutions if the problem in~\cref{sec:pwconstant} is considered. @@ -366,7 +367,7 @@ In this special case, we introduce a recursive solution given by a modified Kalm where $\bar{\cdot}$ denotes the posterior value. This additional correction can be interpreted as receiving a second measurement $\mu_0(v_t)$ with covariance $\Sigma_0(v_t)$. For the Kalman-smoothing algorithm, $\hat{x}_{t|t}$ and $P_{t|t}$ in \labelcref{eq:postx,eq:postcov} are replaced with $\hat{x}_{t|T}$ and $P_{t|T}$. -A prior over the output of the system, or a subset thereof, is straight forward to include in the estimation by means of an extra update step, with $C,R_2$ and $y$ being replaced with their corresponding values according to the prior. +A prior over the output of the system, or a subset thereof, is straightforward to include in the estimation by means of an extra update step, with $C,R_2$ and $y$ being replaced with their corresponding values according to the prior. \subsection{Kalman filter for identification} We can employ the Kalman-based algorithm to solve two of the proposed optimization problems: @@ -399,7 +400,7 @@ $(x_{t+1},x_t)$ is an ill-posed problem in the sense that the solution is non unique. If we are given several pairs $(x_{t+1},x_t)$, for different $t$, while $A$ remains constant, the problem becomes over-determined and well-posed in the least-squares sense, provided that the vectors of state components $\{x_t^{(i)}\}_{t=1}^T$ span $\mathbb{R}^n$. The LTI-case in~\cref{sec:lti} is well posed according to -classical results, when $\Phi$ has full column rank. +classical results when $\Phi$ has full column rank. When we extend our view to LTV models, the number of free parameters is increased significantly, and the corresponding @@ -408,7 +409,7 @@ rank and the introduction of a regularization term is necessary. Informally, for every $n$ measurements, we have $K=n^2+nm$ free parameters. If we consider the identification problem of~\cref{eq:pwconstant} and let -$\lambda \rightarrow \infty$, the regularizer terms essentially becomes equality constraints. +$\lambda \rightarrow \infty$, the regularizer terms essentially become equality constraints. This will enforce a solution in which all parameters in $k$ are constant over time, and the problem reduces to the LTI-problem. As $\lambda$ decreases, @@ -485,7 +486,7 @@ to $$A_t = \left[ \right] $$ occurred at $t=200$. -The input was Gaussian noise of zero mean and unit variance, state transition noise and measuremet noise ($y^m_t = x_{t+1} + v^m_t$) of zero mean and $\sigma_{y_m} = 0.2$ were added. +The input was Gaussian noise of zero mean and unit variance, state transition noise and measurement noise ($y^m_t = x_{t+1} + v^m_t$) of zero mean and $\sigma_{y_m} = 0.2$ were added. \Cref{fig:ss} depicts the estimated coefficients in the dynamics matrices for a value of $\lambda$ chosen using the L-curve method~\cite{hansen1994regularization}. \begin{figure} \centering @@ -501,7 +502,7 @@ The input was Gaussian noise of zero mean and unit variance, state transition no \section{Example -- Non-smooth robot arm with stiff contact} To illustrate the ability of the proposed models to represent the non-smooth dynamics along a trajectory of a robot arm, we simulate a two-link robot with discontinuous Coulomb friction. We also let the robot establish a stiff contact with the environment to illustrate both strengths and weaknesses of the modeling approach. -The state of the robot arm consists of two joint coordinates, $q$, and their time derivatives, $\dot q$. \Cref{fig:robot_train} illustrates the state trajectories, control torques and simulations of a model estimated by solving~\labelcref{eq:pwconstant}. The figure clearly illustrates that the model is able to capture the dynamics both during the non-smooth sign change of the velocity, but also during establishment of the stiff contact. The learned dynamics of the contact is however time-dependent, which is, in some situations, a drawback of the model and is illustrated in \Cref{fig:robot_val}, where the model is used on a validation trajectory where a different noise sequence was added to the control torque. Due to the novel input signal, the contact is established at a different time-instant and as a consequence, there is an error transient in the simulated data. +The state of the robot arm consists of two joint coordinates, $q$, and their time derivatives, $\dot q$. \Cref{fig:robot_train} illustrates the state trajectories, control torques and simulations of a model estimated by solving~\labelcref{eq:pwconstant}. The figure clearly illustrates that the model is able to capture the dynamics both during the non-smooth sign change of the velocity, but also during the establishment of the stiff contact. The learned dynamics of the contact is however time-dependent, which is, in some situations, a drawback of the model and is illustrated in \Cref{fig:robot_val}, where the model is used on a validation trajectory where a different noise sequence was added to the control torque. Due to the novel input signal, the contact is established at a different time-instant and as a consequence, there is an error transient in the simulated data. \begin{figure*}[htp] \centering \setlength{\figurewidth}{0.495\linewidth} @@ -592,7 +593,7 @@ the LTV model is fit using a prior (\cref{sec:kalmanmodel}), the learning speed % $z$-transform operator. % The problem of estimating the coefficients in the transfer function remains % linear given the input-output sequences $u$ and $y$. Extension to time-varying -% dynamics is straight forward. +% dynamics is straightforward. This article presents methods for estimation of linear, time-varying models. The methods presented extend directly to nonlinear models that remain \emph{linear in the parameters}. When estimating an LTV model from a trajectory obtained from a nonlinear system, one is effectively estimating the linearization of the system around that trajectory. A first-order approximation to a nonlinear system is not guaranteed to generalize well as deviations from the trajectory become large. Many non-linear systems are, however, approximately \emph{locally} linear, such that they are well described by a linear model in a small neighborhood around the linearization/operating point. For certain methods, such as iterative learning control and trajectory centric reinforcement learning, a first-order approximation to the dynamics is used for efficient optimization, while the validity of the approximation is ensured by incorporating penalties or constraints between two consecutive trajectories. @@ -608,7 +609,7 @@ several orders of magnitude faster than solving the optimization problems with an iterative solver.} Example use cases include when dynamics are changing with a continuous auxiliary variable, such as temperature, altitude or velocity. If a smooth parameter drift is found to correlate with an auxiliary variable, -LPV-methodology can be employed to model the dependence explicitly. +LPV-methodology can be employed to model the dependency explicitly. Dynamics may change abruptly as a result of, e.g., system failure, change of operating mode, or when a sudden disturbance enters the system, such as a policy change affecting a market or diff --git a/jump_lin_id/pres/pres_idpaper.tex b/jump_lin_id/pres/pres_idpaper.tex new file mode 100644 index 0000000..4c519ea --- /dev/null +++ b/jump_lin_id/pres/pres_idpaper.tex @@ -0,0 +1,126 @@ +\documentclass[10pt,handout]{beamer} +% \usepackage{pgfpages} \pgfpagesuselayout{8 on 1}[a4paper,border shrink=5mm] +\usetheme[liontopcorner,framenumbers]{Regler2} +\usepackage{graphicx} +\usepackage[utf8]{inputenc} % For Swedish characters +% \usepackage{fontspec}\setsansfont{Roboto} +\usepackage{fourier}\DeclareSymbolFont{symbols2} {OMS}{cmsy}{m}{n} \DeclareSymbolFontAlphabet{\mathcal}{symbols2}\linespread{0.95} +\usefonttheme[]{serif} + +\usepackage{ulem} % For strikethrough +\normalem % Normal \emph text +\usepackage{cleveref} +\usepackage[style=verbose,autocite=footnote]{biblatex} +\addbibresource{../bibtexfile.bib} +\usepackage{siunitx} +\usepackage{color} +\usepackage{pgfplots} +\usepgfplotslibrary{groupplots} +\pgfplotsset{compat=newest} +\usepackage{tikz} +\usetikzlibrary{shapes,positioning} +\usetikzlibrary{narrow} +\tikzset{block/.style={draw, rectangle, line width=2pt, +minimum height=2em, minimum width=3em, outer sep=0pt}} +\tikzset{sumcircle/.style={draw, circle, outer sep=0pt, +label=center:{{$\sum$}}, minimum width=2em}} +\tikzset{every picture/.style={auto, line width=1pt, +>=narrow,font=\small}} + +% baby blue = 0.725, 0.827, 0.863 +% mint green = 0.678, 0.792, 0.722 +% baby pink = 0.914, 0.769, 0.78 +% goldish = 0.839, 0.824, 0.769 +% bronze = 0.61, 0.38, 0.08 + +\definecolor{babyblue}{rgb}{0.725, 0.827, 0.863} +\definecolor{mintgreen}{rgb}{0.678, 0.792, 0.722} +\definecolor{babypink}{rgb}{0.914, 0.769, 0.78} +\definecolor{goldish}{rgb}{0.839, 0.824, 0.769}\definecolor{goldishlight}{rgb}{0.961, 0.949, 0.925} +\definecolor{actualbronze}{rgb}{0.61, 0.38, 0.08} + +\definecolor{bronze}{rgb}{0.678, 0.792, 0.722} % Half bottom and block frames +\definecolor{header}{rgb}{0.725, 0.827, 0.863} % Header and half bottom +\definecolor{LUblue}{rgb}{0.61, 0.38, 0.08} % Title +\setbeamercolor{item}{fg=actualbronze} % Change color of item bullet +\setbeamercolor{block title}{use=structure,fg=white,bg=structure.fg} + +\title[Neural-Networks for Dynamical System Modeling]{Tangent-Space Regularization for Dynamical System Modeling using Neural Networks} + +\date{\today} +% \author[Fredrik Bagge Carlson]{\textbf{\large Fredrik Bagge Carlson}, \textnormal{Anders Robertsson, Rolf Johansson}} +\institute{Lund University, Department of Automatic Control} + + +\definecolor{red}{rgb}{0.7,0.2,0.2} +\definecolor{blue}{rgb}{0,0,0.55} +\definecolor{green}{rgb}{0.1,.65,0.25} +\definecolor{darkred}{rgb}{0.5,0,0} +\definecolor{yellow}{rgb}{.5,.5,0} +\definecolor{linecolor1}{rgb}{0,0,0.4} +\definecolor{linecolor2}{rgb}{1,0.55,0} +\definecolor{linecolor3}{rgb}{0.1,1,1} +\definecolor{linecolor4}{rgb}{0.6,0,0.5} +\newcommand{\nice}[1]{{\color{green}#1}} +\newcommand{\bad}[1]{{\color{red}#1}} +\newcommand{\niceat}[2]{{\color<#1>{green}#2}} +\newcommand{\badat}[2]{{\color<#1>{red}#2}} +\newcommand{\cmt}[1]{{\color{yellow}{\textbf{Comment:} #1}}} +\newcommand{\T}{^{\hspace{-0.1mm}\scriptscriptstyle \mathsf{T}}\hspace{-0.2mm}} +\newcommand{\iT}{^{-T}\hspace{-0.6mm}} +\newcommand{\norm}[1]{\begin{Vmatrix}#1\end{Vmatrix}_2} +\newcommand{\inspace}[1]{\in \mathbb{R}^{#1}} +\newcommand{\incspace}[1]{\in \mathbb{C}^{#1}} +\newcommand{\card}[1]{\text{card}(#1)} +\renewcommand{\v}{v} +\renewcommand{\a}{\dot{v}} +\newcommand{\amp}{A} +\newcommand{\A}{\mathbf{A}} +\newcommand{\w}{k} +\newcommand{\PI}{\left(\A \hspace{-0.2mm}\T\hspace{-0.1mm}\A\right)^{\hspace{-0.4mm}-1} \hspace{-1mm} \A\hspace{-0.3mm}\T} +\newcommand{\tA}{\tilde{\mathbf{A}}} +\DeclareMathOperator{\sign}{sign} +\DeclareMathOperator*{\argmin}{arg\,min} + + +\begin{document} +\newlength\figureheight +\newlength\figurewidth +% \setbeamercolor{background canvas}{bg=goldishlight} + + +\maketitle + + +%==================================================================== +%==================================================================== +\begin{frame}{Introduction} + Dynamical control systems are often described by differential state-equations + $$\dot x(t) = f_c(x(t), u(t))$$ + where $x$ is the state, $u$ is the input + \begin{block}{Example -- Robot} + $$\ddot x = M^{-1}(x) \big( C(x,\dot x)\dot x + G(x) + F(\dot x) - u \big)$$ + \end{block} + \pause + + Discretization (sampling) leads to + $$x_{t+1} = f(x_t, u_t)$$ + + \begin{block}{Objective 1} + Learn the function $f$ + $$x_{t+1} = f(x_t, u_t)$$ + \end{block} + +\end{frame} + + + +%==================================================================== +%==================================================================== +\begin{frame}{Open source}{} + Code to train the models presented in this talk available at + \url{https://github.com/baggepinnen/LTVModels.jl} +\end{frame} + + +\end{document} -- GitLab