From 7c7cb94ac7c1fbbcc0f943107233f3e63e0fc550 Mon Sep 17 00:00:00 2001
From: baggepinnen <cont-frb@ulund.org>
Date: Wed, 17 Jan 2018 13:25:41 +0100
Subject: [PATCH] Updates based on reviews

---
 jump_lin_id/bibtexfile.bib |  13 ++-
 jump_lin_id/id_paper.tex   | 206 +++++++++++++++++++++----------------
 2 files changed, 127 insertions(+), 92 deletions(-)

diff --git a/jump_lin_id/bibtexfile.bib b/jump_lin_id/bibtexfile.bib
index 9cd7a72..5036b3d 100644
--- a/jump_lin_id/bibtexfile.bib
+++ b/jump_lin_id/bibtexfile.bib
@@ -77,7 +77,7 @@
 }
 
 @article{kim2009ell_1,
-  title={$\backslash$ell\_1 Trend Filtering},
+  title={$\ell_1$ Trend Filtering},
   author={Kim, Seung-Jean and Koh, Kwangmoo and Boyd, Stephen and Gorinevsky, Dimitry},
   journal={SIAM review},
   volume={51},
@@ -150,3 +150,14 @@
   year={2002},
   organization={IEEE}
 }
+
+@article{parikh2014proximal,
+  title={Proximal algorithms},
+  author={Parikh, Neal and Boyd, Stephen and others},
+  journal={Foundations and Trends{\textregistered} in Optimization},
+  volume={1},
+  number={3},
+  pages={127--239},
+  year={2014},
+  publisher={Now Publishers, Inc.}
+}
diff --git a/jump_lin_id/id_paper.tex b/jump_lin_id/id_paper.tex
index 227d752..860f6c5 100644
--- a/jump_lin_id/id_paper.tex
+++ b/jump_lin_id/id_paper.tex
@@ -1,5 +1,5 @@
-\documentclass[letterpaper, 10 pt,conference]{ieeeconf}
-\IEEEoverridecommandlockouts
+% \documentclass[a4paper, 10 pt]{article} \usepackage{iclr2018_conference,times}
+\documentclass[letterpaper, 10 pt,conference]{ieeeconf}\IEEEoverridecommandlockouts
 %\overrideIEEEmargins
 % \pdfminorversion=4
 \usepackage{graphicx}
@@ -137,10 +137,13 @@ SE22100 Lund Sweden\protect\\
 \Crefname{proposition}{Proposition}{Propositions}
 \crefname{corollary}{Corollary}{Corollaries}
 \Crefname{corollary}{Corollary}{Corollaries}
+\crefname{theorem}{Theorem}{Theorems}
+\Crefname{theorem}{Theorem}{Theorems}
 
 \begin{document}
 \newtheorem{proposition}{Proposition}
 \newtheorem{corollary}{Corollary}
+\newtheorem{theorem}{Theorem}
 \newlength\figureheight
 \newlength\figurewidth
 \setlength{\figurewidth}{0.4\textwidth}
@@ -152,11 +155,12 @@ SE22100 Lund Sweden\protect\\
 \pagestyle{empty}
 
 \begin{abstract}
-We review some classical methods for offline identification of linear, time-varying (LTV) dynamical models and put them into a larger, more general context where we introduce a number of new techniques. Specifically, we introduce methods for identification of linear dynamical systems where the change in dynamics can be either discontinuous or smooth and where the number of changes in the dynamics parameters may be unknown or specified. We further discuss the introduction of priors on the model parameters for increased statistical power in situations where excitation is insufficient for identification. The identification problems are cast as convex optimization problems and are applicable to, e.g., ARX-models and state-space models with time-varying parameters. We illustrate usage of the methods in simulations and in an application of model-based reinforcement learning. \cmt{We establish a connection between trend filtering and system identification which results in novel sys id methods.}
+    We establish a connection between trend filtering and system identification which results in a family of new identification methods for linear, time-varying (LTV) dynamical models based on convex optimization. We demonstrate how the design of the cost function promotes either a continuous change in dynamics over time, or causes discontinuous changes in model coefficients occurring at a finite (sparse) set of time instances. We further discuss the introduction of priors on the model parameters for situations where excitation is insufficient for identification. The identification problems are cast as convex optimization problems and are applicable to, e.g., ARX-models and state-space models with time-varying parameters. We illustrate usage of the methods in simulations of a nonlinear robot arm with non-smooth friction and in model-based, trajectory centric reinforcement learning on a smooth nonlinear system.
+    % \cmt{Rewrote abstract to make contribution stand out from previous work and highlight interesting nonlinear and non-smooth nature of examples.}
 \end{abstract}
-\begin{keywords}
-   System Identification, Piecewise Linear Modeling, Trend Filtering, Kalman Smoothing, Time-Series Segmentation, Reinforcement Learning.
-\end{keywords}
+% \begin{keywords}
+%    System Identification, Piecewise Linear Modeling, Trend Filtering, Kalman Smoothing, Time-Series Segmentation, Reinforcement Learning.
+% \end{keywords}
 
 
 % ---------------------------------------------------------
@@ -164,17 +168,19 @@ We review some classical methods for offline identification of linear, time-vary
 % ---------------------------------------------------------
 \section{Introduction}\label{sec:introduction}
 
-The difficulty of the task of identifying time-varying dynamical models of systems varies greatly with the model considered and the availability of measurements of the state sequence. For smoothly changing, linear dynamics,\footnote{We take linear dynamics to mean models that are linear in the parameters.} the recursive least-squares algorithm with exponential forgetting (RLS$\lambda$) is a common alternative. If a Gaussian random-walk model for the parameters is assumed, a Kalman filtering/smoothing algorithm gives the filtering/smoothing densities of the parameters in closed form. The assumption of smoothly (Gaussian) varying dynamics is often restrictive. Discontinuous dynamics changes occur, for instance, when an external controller changes operation mode, when a sudden contact between a robot and its environment is established, an unmodeled disturbance enters the system or when a system is suddenly damaged.
+The difficulty of the task of identifying time-varying dynamical models of systems varies greatly with the model considered and the availability of measurements of the state sequence. For smoothly changing, linear dynamics,\footnote{Linear dynamics in the sense of models that are linear in the parameters.} the recursive least-squares algorithm with exponential forgetting (RLS$\lambda$) is a common option. If a Gaussian random-walk model for the parameters is assumed, a Kalman filtering/smoothing algorithm gives the filtering/smoothing densities of the parameters in closed form. The assumption of smoothly (Gaussian) varying dynamics is often restrictive. Discontinuous dynamics changes occur, for instance, when an external controller changes operation mode, when a sudden contact between a robot and its environment is established, an unmodeled disturbance enters the system or when a system is suddenly damaged.
 
-Identification of systems with non-smooth dynamics evolution has been studied extensively. The book~\cite{costa2006discrete} treats the case where the dynamics are known, but the state sequence unknown. In~\cite{nagarajaiah2004time}, the authors examine the residuals from an initial constant dynamics fit to determine regions in time where improved fit is needed by the introduction of additional constant dynamics models. Results on identifiability and observability in jump-linear systems in the non-controlled setting are available in~\cite{vidal2002observability}. The main result on identifiability was a rank condition on a Hankel matrix constructed from the collected output data, similar to classical results on the least-squares identification of ARX-models which appears as rank constraints on the, typically Toeplitz or block-Toeplitz, regressor matrix. Identifiability of the methods proposed in this article are discussed in~\cref{sec:identifiability}.
+Identification of systems with non-smooth dynamics evolution has been studied extensively. The book~\cite{costa2006discrete} treats the case where the dynamics are known, but the state sequence unknown, i.e., state estimation. In~\cite{nagarajaiah2004time}, the authors examine the residuals from an initial constant dynamics fit to determine regions in time where improved fit is needed by the introduction of additional constant dynamics models. Results on identifiability and observability in jump-linear systems in the non-controlled (autonomous) setting are available in~\cite{vidal2002observability}. The main result on identifiability in \cite{vidal2002observability} was a rank condition on a Hankel matrix constructed from the collected output data, similar to classical results on the least-squares identification of ARX-models which appears as rank constraints on the, typically Toeplitz or block-Toeplitz, regressor matrix. Identifiability of the methods proposed in this article are discussed in~\cref{sec:identifiability}.
+
+An LTV model can be seen as a first order approximation of the dynamics of a nonlinear system around a trajectory. We emphasize that such an approximation will in general fail to generalize far from the this trajectory, but many methods in reinforcement learning and control make efficient use of the linearized dynamics for optimization, while ensuring validity of the approximation by constraints or penalty terms. One of the examples provided towards the end of this work highlights such a method.
 
 An important class of identification methods that have been popularized lately are \emph{trend filtering} methods~\cite{kim2009ell_1, tibshirani2014adaptive}. Trend filtering methods work by specifying a \emph{fitness criterion} that determines the goodness of fit, as well as a \emph{regularization} term, often chosen with sparsity promoting qualities. As a simple example, consider the reconstruction $\hat y$ of a noisy signal $y = \left\{y_t\inspace{}\right\}_{t=1}^T$ with piecewise constant segments. To this end, we might formulate and solve the convex optimization problem
 \begin{equation} \label{eq:tf}
    \minimize{\hat{y}} \normt{y-\hat{y}}^2 + \lambda\sum_t |\hat{y}_{t+1} - \hat{y}_t|
 \end{equation}
-The first term is the fitness criterion or \emph{loss function}, whereas the second term is a sparsity-promoting regularizer which promotes small changes between consecutive samples in the reconstructed signal. The sparsity promoting effect of the 1-norm regularizer is well known, and stems from the constant size of the gradient whenever the argument is non-zero~\cite{murphy2012machine}. Compare this to the squared difference, for which the gradient rapidly vanishes as the argument approaches zero. The squared difference will thus promote \emph{small} arguments, whereas the 1-norm promotes \emph{sparse} arguments.
+The first term is the fitness criterion or \emph{loss function}, whereas the second term is a sparsity-promoting regularizer which promotes small changes between consecutive samples in the reconstructed signal. The sparsity promoting effect of the 1-norm regularizer is well known, and stems from the constant length of the gradient whenever the argument is non-zero~\cite{murphy2012machine}. Compare this to the squared difference, for which the gradient rapidly vanishes as the argument approaches zero. The squared difference will thus promote \emph{small} arguments, whereas the 1-norm promotes \emph{sparse} arguments.
 
-In this work, we will draw inspiration from the trend-filtering literature to develop new system identification methods for LTV models, with interesting properties. We start by defining a set of optimization problems with a least-squares loss function and carefully chosen regularization terms. We further discuss how prior information can be utilized to increase the accuracy of the identification and end the article with an application of model-based reinforcement learning and a discussion.
+In this work, we will draw inspiration from the trend-filtering literature to develop new system identification methods for LTV models, with interesting properties. In trend filtering, we decompose a curve as a set of polynomial segments. In the identification methods proposed in this work, we instead decompose a multivariable state sequence as the output of a set of LTV models, where the model coefficients evolve as polynomial functions of time. We start by defining a set of optimization problems with a least-squares loss function and carefully chosen regularization terms. We further discuss how prior information can be utilized to increase the accuracy of the identification and end the article with identification of a nonlinear system with non-smooth friction and an example of model-based reinforcement learning and a discussion.
 
 \section{LTI identification}\label{sec:lti}
 
@@ -182,13 +188,13 @@ We start by considering the case of identification of the parameters in an LTI m
 \begin{equation}
    x_{t+1} = A x_t + B u_t + v_t, \quad t \in [1,T]
 \end{equation}
-where $x\inspace{n}$ and $u\inspace{m}$ are the state and input respectively. If the state and input sequences are known, a plethora of methods for estimating the parameters exists. A common method for identification of systems that are linear in the parameters is the least-squares (LS) method, which in case of Gaussian noise, $v$, coincides with the maximum likelihood (ML) estimate. To facilitate estimation using the LS method, we write the model on the form $\y = \A\w$, and arrange the data according to
+where $x\inspace{n}$ and $u\inspace{m}$ are the state and input respectively. A discussion around the noise term $v_t$ is deferred until~\cref{sec:general}, where we indicate how statistical assumptions on $v_t$ influence the cost function and the properties of the estimate. If the state and input sequences are known, a plethora of methods for estimating the parameters exists. A common method for identification of systems that are linear in the parameters is the least-squares (LS) method, which in case of Gaussian noise, $v$, coincides with the maximum likelihood (ML) estimate. To facilitate estimation using the LS method, we write the model on the form $\y = \A\w$, and arrange the data according to
 \begin{align*}
    \y &=
    \begin{bmatrix}
       {x_1} \\ \vdots \\ {x_T}
    \end{bmatrix} & &\inspace{Tn} \\
-   \w &= \vec{\bmatrixx{A & B}\T} & &\inspace{K}\\[0.2em]
+   \w &= \vec{\bmatrixx{A\T & B\T}} & &\inspace{K}\\[0.2em]
    \A &=
    \begin{bmatrix}
       I_n \otimes x_0\T & I_n \otimes u_0\T \\
@@ -206,7 +212,7 @@ where $\otimes$ denotes the Kronecker product and $K=n^2+nm$ is the number of mo
 
 
 \section{Time-varying dynamics}
-We now extend out view to systems where the dynamics change with time, and focus our attention to models on the form
+We now move on to the contribution of this work, and extend out view to systems where the dynamics change with time. We limit the scope of this article to models on the form
 \begin{equation}
 \label{eq:tvk}
 \begin{split}
@@ -223,7 +229,7 @@ where the parameters $\w$ are assumed to evolve according to the dynamical syste
 \end{split}
 \end{equation}
 \recmt{This eq not clear. Motivate noise terms better.} \cmt{Connect noise terms to MLE}
-where, if no prior knowledge is available, the dynamics matrix $H_t$ can be taken as the identity matrix.
+where, if no prior knowledge is available, the dynamics matrix $H_t$ can be taken as the identity matrix, which implies that the model coefficients follow a random walk dictated by $\w$.
 The evolution of the dynamics is thus governed by the emission density function $p_v(x_{t+1} | x_t, u_t, k_t)$ and a state transition density function $p_w(k_{t+1}|k_t)$.
 The density $p_v$ is determining the distribution of measurements, given the current state of the system, whereas $p_w$ governs the transition of the parameter vector between time steps.
 We emphasize here that the state in the parameter evolution model refers to the current parameters $k_t$ and not the system state $x_t$, hence, $p_v$ is called the emission density and not the transition density. Particular choices of $p_v$ and $p_w$ emit data likelihoods concave in the parameters and hence amenable to convex optimization.
@@ -231,29 +237,28 @@ We emphasize here that the state in the parameter evolution model refers to the
 
 The following sections will introduce a number of optimization problems with different regularization functions and regularization arguments, corresponding to different choices of $p_w$, and discuss the quality of the resulting identification.
 
-\subsection{Slow time evolution \cmt{Low frequency time evolution}}
+\subsection{Low frequency time evolution}
 A slowly varying signal is characterized by \emph{small first-order time differences}.
 To identify slowly varying dynamics parameters, we thus penalize the squared 2-norm of the first-order time difference of the model parameters, and solve the optimization problem
 \begin{equation} \label{eq:slow}
    \minimize{\w} \normt{\y-\hat{\y}}^2 + \lambda\sum_t \normt{\w_{t+1} - \w_{t}}^2
 \end{equation}
-where $\sum_t$ denotes the sum over meaningful indices $t$, in this case $t\in [1,T-1]$.
-This optimization problem has a closed form solution\footnote{The computational complexity $\mathcal{O}\big((TK)^3\big)$ of computing $k^*$ using the closed-form solution as opposed to an iterative method becomes prohibitive for all but toy problems.}
+where $\sum_t$ denotes the sum over relevant indices $t$, in this case $t\in [1,T-1]$.
+This optimization problem has a closed form solution given by
 \begin{align}\label{eq:closedform}
-    \tilde{\w}^* &= (\tilde{\A}\T\tilde{\A} + \lambda D_1\T D_1)^{-1}\tilde{\A}\T \tilde{Y}\\
+    \tilde{\w}^* &= (\tilde{\A}\T\tilde{\A} + \lambda^2 D_1\T D_1)^{-1}\tilde{\A}\T \tilde{Y}\\
     \tilde{\w} &= \operatorname{vec}(\w_1, ...\,, \w_T)\nonumber
 \end{align}
-\recmt{$\tilde{y}$ for a matrix is confusing}
-where $\tilde{\A}$ and $\tilde{Y}$ are appropriately constructed matrices and the first-order differentiation operator matrix $D_1$ is constructed such that $\lambda\normt{D_1 \tilde{\w}}^2$ equals the second term in~\labelcref{eq:slow}.
-\Cref{eq:slow} is the negative data log-likelihood of a Brownian random-walk parameter model for which also a Kalman smoother returns the optimal solution, see~\cref{sec:kalmanmodel} for details and extensions.
+where $\tilde{\A}$ and $\tilde{Y}$ are appropriately constructed matrices and the first-order differentiation operator matrix $D_1$ is constructed such that $\lambda^2\normt{D_1 \tilde{\w}}^2$ equals the second term in~\labelcref{eq:slow}.
+The computational complexity $\mathcal{O}\big((TK)^3\big)$ of computing $k^*$ using the closed-form solution \labelcref{eq:closedform} becomes prohibitive for all but toy problems. We note that the cost function in \labelcref{eq:slow} is the negative data log-likelihood of a Brownian random-walk parameter model, which motivates us to develop a dynamic programming algorithm based on a Kalman smoother, detailed in~\cref{sec:kalmanmodel}.
 
 \subsection{Smooth time evolution}
 A smoothly varying signal is characterized by \emph{small second-order time differences}.
 To identify smoothly time-varying dynamics parameters, we thus penalize the squared 2-norm of the second-order time difference of the model parameters, and solve the optimization problem
 \begin{equation} \label{eq:smooth}
-   \minimize{\w} \normt{\y-\hat{\y}}^2 + \lambda\sum_t \normt{\w_{t+2} -2 \w_{t+1} + \w_t}^2
+   \minimize{\w} \normt{\y-\hat{\y}}^2 + \lambda^2\sum_t \normt{\w_{t+2} -2 \w_{t+1} + \w_t}^2
 \end{equation}
-Also this optimization problem has a closed form solution on the form~\labelcref{eq:closedform} with the corresponding second-order differentiation operator $D_2$. \Cref{eq:smooth} is the negative data log-likelihood of a Brownian random-walk parameter model with added momentum, for which a Kalman smoother with augmented state returns the optimal solution, elaborated upon in~\cref{sec:kalmanmodel}.
+Also this optimization problem has a closed form solution on the form~\labelcref{eq:closedform} with the corresponding second-order differentiation operator $D_2$. \Cref{eq:smooth} is the negative data log-likelihood of a Brownian random-walk parameter model with added momentum, for which a Kalman smoother with augmented state returns the optimal solution, elaborated upon in~\cref{sec:kalmanmodel} where we extend problem \labelcref{eq:smooth} to more general regularization terms.
 
 
 
@@ -262,18 +267,17 @@ In the presence of discontinuous or abrupt changes in the dynamics, estimation m
 \begin{equation} \label{eq:pwconstant}
    \minimize{\w} \normt{\y-\hat{\y}}^2 + \lambda\sum_t \normt{ \w_{t+1} - \w_t}
 \end{equation}
-This \emph{grouped-lasso}-like formulation can be interpreted as putting a 1-norm penalty on the \emph{length} of time-difference parameter vectors.\footnote{With \emph{length} we consider the 2-norm of a vector. The 1-norm of length interpretation comes from the fact $\norm{\normt{\cdot}}_1 = \normt{\cdot}$.}
+We can give \labelcref{eq:pwconstant} an interpretation as a \emph{grouped-lasso} cost function, where instead of groups being formed out of variables, our groups are defined by differences between variables. We thus have a penalty on the 1-norm on the \emph{length} of the difference vectors $\w_{t+1} - \w_t$ since $\norm{\normt{\cdot}}_1 = \normt{\cdot}$.
 The 1-norm is a \emph{sparsity-promoting} penalty, hence a solution in which only a small number of non-zero first-order time differences in the model parameters is favored, i.e., a piecewise constant dynamics evolution.
 At a first glance, one might consider the formulation
 \begin{equation} \label{eq:pwconstant_naive}
    \minimize{\w} \normt{\y-\hat{\y}}^2 + \lambda\sum_t \norm{\w_{t+1} - \w_t}_1
 \end{equation}
-which results in a dynamics evolution with sparse changes in the coefficients, but changes to different entries of $k$ are not necessarily occurring at the same time instants. The formulation~\labelcref{eq:pwconstant}, however, promotes a solution in which the change occurs at the same time instants for all coefficients in $A$ and $B$.
+which results in a dynamics evolution with sparse changes in the coefficients, but changes to different entries of $\w_t$ are not necessarily occurring at the same time instants. The formulation~\labelcref{eq:pwconstant}, however, promotes a solution in which the change occurs at the same time instants for all coefficients in $A$ and $B$, i.e., $\w_{t+1} = \w_{t}$ for most $t$.
 
-Dynamics may change abruptly as a result of, e.g., system failure, change of operating mode, or when a sudden disturbance enters the system, such as a policy change affecting a market or a window opening affecting the indoor temperature. This identification method can be employed to identify when such changes occur, without specifying apriori how many changes are expected.
 
 \subsubsection{Implementation}
-Due to the non-squared norm penalty $\sum_t \normt{ \w_{t+1} - \w_t}$, problem \labelcref{eq:pwconstant} is significantly harder to solve than \labelcref{eq:smooth}. An efficient implementation using the ADMM algorithm is made available in the accompanying repository.
+Due to the non-squared norm penalty $\sum_t \normt{ \w_{t+1} - \w_t}$, problem \labelcref{eq:pwconstant} is significantly harder to solve than \labelcref{eq:smooth}. An efficient implementation using the linearized ADMM algorithm \cite{parikh2014proximal} is made available in the accompanying repository.
 
 
 \subsection{Piecewise constant time evolution with known number of steps}
@@ -284,23 +288,24 @@ If the number of switches in dynamics parameters, $M$, is known in advance, the
    & \subjto &        & \sum_t \textbf{1}\{ \w_{t+1} \neq \w_t\} \leq M
 \end{align}
 where $\textbf{1}\{\cdot\}$ is the indicator function.
-This problem is non-convex, but admits an exact solution through dynamic programming (DP), with an extended version of \cite{bellman1961approximation}, an algorithm frequently referred to as segmented least-squares~\cite{bellman1969curve}. The extension lies in the association of each segment with a dynamics model, as opposed to a simple straight line \recmt{sentence not clear}.\footnote{Indeed, if a simple integrator is chosen as dynamics model and a constant input is assumed, the result of our extended algorithm reduces to the segmented least-squares solution.} Unfortunately, the computational complexity of the dynamic programming solution, $\mathcal{O}(T^2K^3)$, becomes prohibitive for large $T$.\footnote{For details regarding the DP algorithm and implementation, the reader is referred to the source-code repository accompanying this article.}
+This problem is non-convex and we propose solving it using dynamic programming (DP). For this purpose we modify the algorithm developed in \cite{bellman1961approximation}, an algorithm frequently referred to as segmented least-squares~\cite{bellman1969curve}, where a curve is approximated by piecewise linear segments. The modification lies in the association of each segment (set of consecutive time indices during which the dynamics is constant) with a dynamics model, as opposed to a simple straight line.\footnote{Indeed, if a simple integrator is chosen as dynamics model and a constant input is assumed, the result of our extended algorithm reduces to the segmented least-squares solution.} Unfortunately, the computational complexity of the dynamic programming solution, $\mathcal{O}(T^2K^3)$, becomes prohibitive for large $T$.\footnote{For details regarding the DP algorithm and implementation, the reader is referred to the source-code repository accompanying this article.}
 
 \subsection{Piecewise linear time evolution}\label{eq:pwlinear}
 A piecewise linear signal is characterized by a \emph{sparse second-order time difference}, i.e., it has a small number of changes in the slope. A piecewise linear time-evolution of the dynamics parameters is hence obtained if we solve the optimization problem.
 \begin{equation} \label{eq:pwlinear}
    \minimize{\w} \normt{\y-\hat{\y}}^2 + \lambda\sum_t \normt{\w_{t+2} -2 \w_{t+1} + \w_t}
 \end{equation}
-The only difference between \labelcref{eq:pwlinear} and~\labelcref{eq:smooth} is the square root in the 2-norm of \labelcref{eq:pwlinear}. The two regularizers are illustrated in~\cref{fig:surf}.
-\begin{figure}[htp]
-    \centering
-    \includegraphics[width=0.9\linewidth]{figs/surf.png}
-    \caption{Comparison between the 2-norm (pointy bottom) and squared 2-norm (round bottom) regularizers. The figure illustrates how the gradient of the 2-norm regularizer remains constant as we move towards the origin, whereas the 2-norm regularizer has a vanishing gradient close to the origin. This difference leads to sparse results when applying 2-norm penalties and small results when applying squared 2-norm penalty.}
-    \label{fig:surf}
-\end{figure}
+% The only difference between \labelcref{eq:pwlinear} and~\labelcref{eq:smooth} is the square root in the 2-norm of \labelcref{eq:pwlinear}.
+% The two regularizers are illustrated in~\cref{fig:surf}.
+% \begin{figure}[htp]
+%     \centering
+%     \includegraphics[width=0.9\linewidth]{figs/surf.png}
+%     \caption{Comparison between the 2-norm (pointy bottom) and squared 2-norm (round bottom) regularizers. The figure illustrates how the gradient of the 2-norm regularizer remains constant as we move towards the origin, whereas the squared 2-norm regularizer has a vanishing gradient close to the origin. This difference leads to sparse results when applying 2-norm penalties and small results when applying squared 2-norm penalty. \recmt{Figure not needed}}
+%     \label{fig:surf}
+% \end{figure}
 
 \subsection{Summary}
-The proposed optimization problems are summarized in~\cref{tab:opts}. The table illustrates how the choice of norm and order of time-differentiation of the parameter vector affects the quality of the resulting solution.
+The proposed optimization problems are summarized in~\cref{tab:opts}. The table illustrates how the choice of regularizer and order of time-differentiation of the parameter vector affects the quality of the resulting solution.
 
 \begin{table}[]
 \centering
@@ -319,23 +324,19 @@ Norm & $D_n$ & Result    \\ \midrule
 
 \subsection{Two-step refinement}
 
-Since many of the proposed formulations of the optimization problem penalize the size of the changes to the parameters, solutions in which the changes are slightly underestimated are favored. To mitigate this issue, a two-step procedure can be implemented where in the first step, change points are identified. In the second step, the penalty on the one-norm is interchanged for equality constraints between the non-zero changes in dynamics.
+Since many of the proposed formulations of the optimization problem penalize the size of the changes to the parameters, solutions in which the changes are slightly underestimated are favored. To mitigate this issue, a two-step procedure can be implemented where in the first step, change points are identified. In the second step, the penalty on the one-norm is removed and equality constraints are introduced between consecutive time-indices for which no change in dynamics was indicated by the first step.
 
-The second step can be computed very efficiently by noticing that the problem can be split into several identical sub-problems at the knots identified in the first step. The sub-problems have closed-form solutions in case~\cref{sec:pwconstant}.
+The second step can be computed very efficiently by noticing that the problem can be split into several identical sub-problems at the knots identified in the first step. The sub-problems have closed-form solutions if the problem in~\cref{sec:pwconstant} is considered.
 
-To identify the points at which the dynamics change, we observe the relevant signal among
-\begin{equation}\label{eq:activation}
-    a_{t1} = \normt{ \w_{t+1} - \w_t}, \quad a_{t2} = \normt{\w_{t+2} -2 \w_{t+1} + \w_t}
-\end{equation}
-where $a_t$ taking non-zero values indicate change points.
+To identify the points at which the dynamics change (knots), we observe the argument inside the sum of the regularization term, i.e., $a_{t1} = \normt{ \w_{t+1} - \w_t}$ or $a_{t2} = \normt{\w_{t+2} -2 \w_{t+1} + \w_t}$. Time instances where $a_t$ is taking non-zero values indicate change points.
 
 \section{Dynamics prior and Kalman filtering}
 \recmt{Reviewer found this section rushed}
-The identifiability of the parameters in a dynamical model hinges on the observability of the dynamics system~\labelcref{eq:dynsys}, or more explicitly, only modes excited by the input $u$ will be satisfactorily identified. If the identification is part of an iterative learning and control scheme, e.g., ILC or reinforcement learning, it might be undesirable to introduce additional noise in the input to improve excitation for identification. This section will introduce prior information about the dynamics which mitigates the issue of poor excitation of the system modes. The prior information might come from, e.g., a nominal model known to be inaccurate, or an estimated global model such as a Gaussian mixture model (GMM). A statistical model of the joint density $p(x_{t+1},x_t,u_t)$ constructed from previously collected tuples $(x_{t+1},x_t,u_t)$ provides a dynamical model of the system through the conditional $p(x_{t+1}|x_t,u_t)$.
+The identifiability of the parameters in a dynamical model hinges on the observability of the dynamics system~\labelcref{eq:dynsys}, or more explicitly, only modes excited by the input $u$ will be satisfactorily identified. If the identification is part of an iterative learning and control scheme, e.g., ILC or reinforcement learning, it might be undesirable to introduce additional noise in the input to improve excitation for identification. This section will introduce prior information about the dynamics which mitigates the issue of poor excitation of the system modes. The prior information might come from, e.g., a nominal model known to be inaccurate, or an estimated global model such as a Gaussian mixture model (GMM). A statistical model of the joint density $p(x_{t+1},x_t,u_t)$ constructed from previously collected tuples $(x_{t+1},x_t,u_t)$ provides a dynamical model of the system through the conditional pdf $p(x_{t+1}|x_t,u_t)$.
 
 We will see that for priors from certain families, the resulting optimization problem remains convex. For the special case of a Gaussian prior over the dynamics parameters or the output, the posterior mean of the parameter vector is conveniently obtained from a Kalman-smoothing algorithm, modified to include the prior.
 
-\subsection{General case}
+\subsection{General case}\label{sec:general}
 
 If we introduce a parameter state $\w$ (c.f., \labelcref{eq:tvk}) and a prior over all parameter-state variables $p(\w_{t}|z_t)$, where the variable $z_t$ might be, for instance, the time index $t$ or state $x_t$, we have the data likelihood
 \begin{align}\label{eq:cdll}
@@ -348,7 +349,7 @@ which factors conveniently due to the Markov property of a state-space model. Ta
     + \sum_{t=1}^{T-1} \log p(\w_{t+1}|\w_t) &+ \sum_{t=1}^{T} \log p(\w_{t}|z_t)
 \end{split}
 \end{equation}
-For particular choices of density functions in~\labelcref{eq:lcdll}, notably Gaussian and Laplacian, the resulting ML estimation problem becomes convex. The next section will elaborate on the Gaussian case and introduce a recursive algorithm that solves for the full posterior efficiently.
+For particular choices of density functions in~\labelcref{eq:lcdll}, notably Gaussian and Laplacian, the resulting ML estimation problem becomes convex. The next section will elaborate on the Gaussian case and introduce a recursive algorithm that solves for the full posterior efficiently. The Laplacian case, while convex, does not admit an equally efficient algorithm, but is more robust to outliers in the data.
 
 \subsection{Gaussian case} \label{sec:kalmanmodel}
 If all densities are Gaussian and $\w$ is modeled with the Brownian random walk model \labelcref{eq:dynsys} (Gaussian $v_t$), \labelcref{eq:lcdll} can be written on the form (constants omitted)
@@ -360,8 +361,7 @@ If all densities are Gaussian and $\w$ is modeled with the Brownian random walk
     &+ \sum_{t=1}^{T} \norm{\mu_0(z_t) - \w_{t}}^2_{\Sigma^{-1}_0(z_t)}
     \end{split}
 \end{equation}
-\recmt{Meaning of \Sigma not clear}
-for some function $\mu_0(z_t)$ which produces the prior mean of $\w$ given $z_t$, and $\norm{x}_{\Sigma^{-1}}^2 = x\T\Sigma^{-1}x$.
+for some function $\mu_0(z_t)$ which produces the prior mean of $\w$ given $z_t$. $\Sigma_y, \Sigma_\w, \Sigma_0(z_t)$ are the covariance matrices of the state-drift, parameter drift and prior respectively and $\norm{x}_{\Sigma^{-1}}^2 = x\T\Sigma^{-1}x$.
 
 In this special case, we introduce a recursive solution given by a modified Kalman smoothing algorithm, where the conditional mean of the state is updated with the prior. Consider the standard Kalman filtering equations
     \begin{align}
@@ -379,34 +379,54 @@ In this special case, we introduce a recursive solution given by a modified Kalm
         \bar{P}_{t|t} &= P_{t|t} - \bar{K}_t P_{t|t}\label{eq:postcov}
     \end{align}
     where $\bar{\cdot}$ denotes the posterior value. This additional correction can be interpreted as receiving a second measurement $\mu_0(v_t)$ with covariance $\Sigma_0(v_t)$.
-    For the Kalman-smoothing algorithm, $\hat{x}_{t|t}$ and $P_{t|t}$ in \labelcref{eq:postx,eq:postcov} are replaced for $\hat{x}_{t|T}$ and $P_{t|T}$.
+    For the Kalman-smoothing algorithm, $\hat{x}_{t|t}$ and $P_{t|t}$ in \labelcref{eq:postx,eq:postcov} are replaced with $\hat{x}_{t|T}$ and $P_{t|T}$.
 
 A prior over the output of the system, or a subset thereof, is straight forward to include in the estimation by means of an extra update step, with $C,R_2$ and $y$ being replaced with their corresponding values according to the prior.
 
-Laplacian densities appearing in~\labelcref{eq:lcdll} give rise to sparsity-promoting terms on the form $\normt{\cdot}$ in~\labelcref{eq:prioropt}. The resulting optimization problem remains convex, but must be solved with an iterative method, similar to~\labelcref{eq:pwconstant}.
-
-
 \subsection{Kalman filter for identification}
-\subsubsection{Slow}
+We can employ the Kalman-based algorithm to solve two of the proposed optimization problems:
+\subsubsection{Low frequency}
 The Kalman filter can be used for solving identification problems like~\labelcref{eq:slow} by noting that~\labelcref{eq:slow} is the negative log-likelihood of the dynamics model~\labelcref{eq:dynsys}.
 \subsubsection{Smooth}
 
-To develop a Kalman-filter based algorithm for solving~\labelcref{eq:smooth}, we augment the model \labelcref{eq:dynsys} with the state variable $\dot{k}_t = k_{t} - k_{t-1}$ and note that $\dot{k}_{t+1} - \dot{k}_t = k_{t+1} - 2k_t + k_{t-1}$. We thus introduce the augmented-state model
+To develop a Kalman-filter based algorithm for solving~\labelcref{eq:smooth}, we augment the model \labelcref{eq:dynsys} with the state variable $k^\prime_t = k_{t} - k_{t-1}$ and note that $k^\prime_{t+1} - k^\prime_t = k_{t+1} - 2k_t + k_{t-1}$. We thus introduce the augmented-state model
 \begin{align}\label{eq:dynsys2}
-   \begin{bmatrix}k_{t+1} \\ \dot{k}_{t+1}\end{bmatrix} &= \begin{bmatrix}I_K & I_K \\ 0_K & I_K\end{bmatrix} \begin{bmatrix}k_{t} \\ \dot{k}_{t}\end{bmatrix} + \begin{bmatrix}0 \\ w_{t}\end{bmatrix}\\
-   y_t &=  \begin{bmatrix}\big(I_n \otimes \bmatrixx{x\T & u\T}\big) & 0\end{bmatrix} \begin{bmatrix}k_{t} \\ \dot{k}_{t}\end{bmatrix}
+   \begin{bmatrix}k_{t+1} \\ k^\prime_{t+1}\end{bmatrix} &= \begin{bmatrix}I_K & I_K \\ 0_K & I_K\end{bmatrix} \begin{bmatrix}k_{t} \\ k^\prime_{t}\end{bmatrix} + \begin{bmatrix}0 \\ w_{t}\end{bmatrix}\\
+   y_t &=  \begin{bmatrix}\big(I_n \otimes \bmatrixx{x\T & u\T}\big) & 0\end{bmatrix} \begin{bmatrix}k_{t} \\ k^\prime_{t}\end{bmatrix}
 \end{align}
 which is on a form suitable for filtering/smoothing with the machinery developed above.
 
+\subsubsection{General case}
+The Kalman-filter based identification methods can be generalized to solving optimization problems where the argument in the regularizer appearing in \labelcref{eq:smooth} is replaced for an general linear operation on the parameter vector, $P(z)k$, and we have the following proposition
+\begin{proposition}
+    Any optimization problem on the form
+    \begin{equation} \label{eq:generalkalman}
+       \minimize{\w} \normt{\y-\hat{\y}}^2 + \lambda^2\sum_t \normt{P(z)\w_t}^2
+    \end{equation}
+    where $P(z)$ is a polynomial of degree $n>0$ in the time difference operator $z$ with $z^{-n} P(1) = 0$, can be solved with a Kalman smoother employed on to an autonomous state-space system.
+\end{proposition}
+\begin{proof}
+    Let $P^*(z^{-1}) = z^{-n} P(z)$. We assume without loss of generality that $P^*(0) = 1$ since any constant $P^*(0)$ can be factored out of the polynomial. $Q(z^{-1}) = P^*(z^{-1})^{-1}$ is a strictly proper transfer function and has a realization as a state-space system of degree $n$. Since $Q(z^{-1})$ is strictly proper, the realization has no direct term.
+\end{proof}
+For~\labelcref{eq:smooth} $P(z)$ equals $z^2 - 2z + 1 = (z-1)^2$, and augmenting the dynamical model describing the parameter evolution with a state corresponding to the polynomial $Q(z) = (z-1)^{-1}P(z)$.
+
 
 
 \section{Well-posedness and identifiability}\label{sec:identifiability}
+\begin{theorem}
+    Optimization problems have unique global minima provided that $A$ and $u$...
+\end{theorem}
+\begin{proof}
+    The cost function is a sum of two convex terms. For the global minimum to be non-unique, the Hessians of the two terms must share nullspace.
+    Let $k$ be a minimizer of $\lambda^2\sum_t \normt{\w_{t+1} - \w_{t}}^2$, then there exist a basis of dimension $K$ spanning the nullspace of $D_1\T D_1$, $\mathcal{N}_D$. Let $v$ be a vector in $\mathcal{N}_D$, for $\Phi\T \Phi$ to share nullspace with $D$, we must have $\Phi\T\Phi v = 0$ or equivalently $y_t = Ck_t = C(k_t+v_t) \Longrightarrow Cv_t = 0$
+
+\end{proof}
 To assess the well-posedness of the proposed identification methods, we start
 by noting that the problem of finding $A$ in $x_{t+1} = Ax_t$ given a pair
 $(x_{t+1},x_t)$ is an ill-posed problem in the sense that the solution is non
 unique. If we are given several pairs $(x_{t+1},x_t)$, for different $t$, while
 $A$ remains constant, the problem becomes over-determined and well-posed in the
-least-squares sense. The LTI-case in~\cref{sec:lti} is well posed according to
+least-squares sense, provided that the vectors $\left\{x_t^{(i)}\right\}_{t=1}^T$ span $\mathbb{R}^n$. The LTI-case in~\cref{sec:lti} is well posed according to
 classical results, when the $\Phi$ has full column rank.
 
 When we extend our view to LTV models, the number of free
@@ -434,7 +454,7 @@ $\operatorname{rank}(D_2\T D_2) = K(T-2)$ respectively
 $\operatorname{rank}(D_1\T D_1) = K(T-1)$ . The matrix $\tilde{\A}\T \tilde{\A} + D\T D$
 will thus be full rank whenever $\tilde{\A}$ has rank
 $2K$ or more, with a basis spanning the rank-$2K$ null
-space of $D$. This will be the case for any non-zero
+space of $D$. \recmt{Something missing in last sentence.}\cmt{Make these statements into propositions and introduce asymptotic analysis?} This will be the case for any non-zero
 state and input sequences of sufficient length, provided that the input, $u$,
 is persistently exciting of sufficient order~\cite{johansson1993system}.
 
@@ -443,6 +463,7 @@ which will yield the desired results for problems~\labelcref{eq:slow,eq:smooth}.
 
 
 \section{Example}
+\cmt{Add two link example where a stiff contact is established and there is a sign change in control signal and hence friction. Apply pwconstant method and maybe scrap linear system example. Fig 3 can be reproduced with analytical jacobian as comparison. Mention in abstract that we then evaluate on one non-linear but smooth system (pendcart) and one non-smooth system (twolink).}
 \recmt{Only linear systems considered (pend-cart actually not linear.} \cmt{Highligt that pendcart is nonlinear or introduce simple robotic example. Mention early that an LTV system along a trajectory does not generalize far for a nonlinear system, hence the use of KL constraint.}
 % \begin{figure}
 %     \centering
@@ -488,7 +509,7 @@ The input was Gaussian noise of zero mean and unit variance, state transition no
     \setlength{\figureheight }{5cm}
     \input{figs/states.tex}
     \figurecaptionreduction
-    \caption{Piecewise constant state-space dynamics. True state values are shown with dashed lines. Gaussian state-transition and measurement noise with $\sigma = 0.2$ were added.}
+    \caption{Piecewise constant state-space dynamics. True state values are shown with dashed lines. Gaussian state-transition and measurement noise with $\sigma = 0.2$ were added. \recmt{Hard to see and understand}}
     \label{fig:states}
 \end{figure}
 \begin{figure}
@@ -506,20 +527,23 @@ The input was Gaussian noise of zero mean and unit variance, state transition no
 \section{Reinforcement learning}
 \recmt{More details needed}
 In this example, we use the proposed methods to identify LTV dynamics models for
-reinforcement learning. The task is to stabilize a pendulum attached to a moving
+reinforcement learning. The goal of the task is to dampen oscillations of a pendulum attached to a moving
 cart by means of moving the cart, with bounds on the control signal and a quadratic
-cost on states and control. To this end, we perform a series of experiments,
-whereafter each we 1) fit a dynamics model, 2) optimize the cost function under
+cost on states and control. Due to the nonlinear nature of the pendulum dynamics,
+linear expansions of the dynamics in the upward (initial) position and downward (final) position have poles
+on opposite sides of the imaginary axis. To this end, we perform a series of experiments,
+whereafter each we 1) fit a dynamics model along the last obtained trajectory,
+2) optimize the cost function under
 the model using differential dynamic programming,\footnote{Implementation made available at
 \href{github.com/baggepinnen/DifferentialDynamicProgramming.jl}{github.com/baggepinnen/DifferentialDynamicProgramming.jl}}
 with bounds on the deviation between the new trajectory and the last trajectory
-in order to stay close to the validity region of the model. This reinforcement-learning strategy has shown great promise lately and is referred to as trust-region policy optimization~\cite{schulman2015trust}. We compare three different models,
-the ground truth system model, an LTV model (\labelcref{eq:smooth}) and an LTI model.
+in order to stay close to the validity region of the model. This reinforcement-learning strategy has shown great promise lately and is part of the guided policy search framework \cite{levine2013guided} and has similarities to trust-region policy optimization~\cite{schulman2015trust}. We compare three different models,
+the ground truth system model, an LTV model (obtained by solving \labelcref{eq:smooth}) and an LTI model.
 The total cost over $T=500$ time steps is shown as a function of learning iteration
 in~\cref{fig:ilc}. The figure illustrates how the learning procedure reaches the
 optimal cost of the ground truth model when an LTV model is used, whereas when
 using an LTI model, the learning diverges. The figure further illustrates that if
-the LTV model is fit using a prior (\cref{sec:kalmanmodel}), the learning speed is increased. The prior in this case was constructed from the linearized system model which is unavailable in a real application, but serves as an indication of the effectiveness of inclusion of a prior in this example.
+the LTV model is fit using a prior (\cref{sec:kalmanmodel}), the learning speed is increased. The prior in this case was constructed from the true system model, linearized around the last trajectory. This strategy is unavailable in a real application, but the experiment serves as an indication of the effectiveness of inclusion of a prior in this example. Future work is targeting the incremental estimation of these priors.
 
 
 \begin{figure}[htp]
@@ -527,7 +551,7 @@ the LTV model is fit using a prior (\cref{sec:kalmanmodel}), the learning speed
     \setlength{\figurewidth}{0.99\linewidth}
     \setlength{\figureheight }{5cm}
     \input{figs/ilc.tex}
-    \caption{Reinforcement learning example. Three different model types are used to iteratively optimize the trajectory of a pendulum on a cart.}
+    \caption{Reinforcement learning example. Three different model types are used to iteratively optimize the trajectory of a pendulum on a cart. Due to the nonlinear nature of the pendulum dynamics, linear expansions of the dynamics in the upward and downward positions have poles on opposite sides of the imaginary axis.}
     \label{fig:ilc}
 \end{figure}
 
@@ -558,25 +582,25 @@ the LTV model is fit using a prior (\cref{sec:kalmanmodel}), the learning speed
 
 
 \section{Discussion}
-\recmt{Some paragraphs hard to follow. Linear operator extension can be moved to theory section. }
-This article is written focused on state-space models, where the state sequence
-is assumed to be known. This may seem like an overly optimistic setting at first,
-but all methods and algorithm discussed apply equally well to the input-output
-setting, where only inputs and outputs are available. Indeed, any
-state-space system
-\begin{align}
-    x_{t+1} &= Ax_t + Bu_t\\
-    y_t &= Cx_t + Du_t
-\end{align}
-can be interpreted as a discrete-time transfer function in the
-$z$-transform operator
-\begin{equation}
-    \dfrac{Y(z)}{U(z)} = \big(C(zI-A)^{-1}B + D\big) = \dfrac{B(z)}{A(z)}
-\end{equation}
-The problem of estimating the coefficients in the polynomials $B$ and $A$ remains
-linear given the input-output sequences $u$ and $y$. Extension to time-varying
-dynamics is straight forward.
-\addtolength{\textheight}{-7cm}
+\recmt{Some paragraphs hard to follow. Linear operator extension can be moved to theory section. } \recmt{Shorten section and reorganize, omit tf eq} \recmt{Rewrite first two sentences}
+
+% This article is written focused on state-space models, where the state sequence
+% is assumed to be known. This may seem like an overly optimistic setting at first,
+% but all methods and algorithm discussed apply equally well to the input-output
+% setting, where only inputs and outputs are available. Indeed, any
+% state-space system
+% \begin{align}
+%     x_{t+1} &= Ax_t + Bu_t\\
+%     y_t &= Cx_t + Du_t
+% \end{align}
+% can be interpreted as a discrete-time transfer function in the
+% $z$-transform operator.
+% The problem of estimating the coefficients in the transfer function remains
+% linear given the input-output sequences $u$ and $y$. Extension to time-varying
+% dynamics is straight forward.
+
+This article presents methods for estimation of linear, time-varying models. When estimating an LTV model from a trajectory obtained from a nonlinear system, one is effectively estimating the linearization of the system around that trajectory. A first order approximation to a nonlinear system is not guaranteed to generalize well as deviations from the trajectory becomes large. For certain methods, such as iterative learning control and trajectory centric reinforcement learning, a first order approximation to the dynamics is used for efficient optimization, while the validity of the approximation is ensured by incorporating penalties or constraints between two consecutive trajectories. The methods presented allow very efficient learning of this first order approximation due to the prior belief over the nature of the change in dynamics parameters, encoded by the regularization terms. By postulating a prior belief that the dynamics parameters change in a certain way, less demand is put on the data required for identification. The identification process will thus not interfere with operation in the same way as if excessive noise would be added to the input for identification purposes. This allows learning of flexible, over-parametrized models that fit available data well. This makes the proposed identification methods attractive in applications such as guided policy search (GPS)~\cite{levine2013guided, levine2015learning} and non-linear iterative learning control (ILC)~\cite{bristow2006survey}, where they can lead to dramatically decreased sample complexity.
+% \addtolength{\textheight}{-7cm}
 
 When faced with a system where time-varying dynamics is suspected and no particular
 knowledge regarding the dynamics evolution is available, or when the dynamics are known to vary slowly,
@@ -589,8 +613,12 @@ continuous auxiliary variable, such as temperature, altitude or velocity.
 If a smooth parameter drift is found to correlate with an auxiliary variable,
 LPV-methodology can be employed to model the dependence explicitly.
 
-Identification of systems known to contain gain scheduling may benefit from formulation~\labelcref{eq:pwlinear}.
-Gain scheduling is usually implemented with overlapping regions, in between which linear interpolation is used.
+Dynamics may change abruptly as a result of, e.g., system failure, change of operating mode,
+or when a sudden disturbance enters the system, such as a policy change affecting a market or
+a window opening affecting the indoor temperature. The identification method \labelcref{eq:pwconstant} can be employed
+to identify when such changes occur, without specifying a priori how many changes are expected.
+
+
 
 
 For simplicity, the regularization weights were kept as simple scalars in this article.
@@ -598,10 +626,6 @@ However, all terms $\lambda\normt{\Delta k}^2 = (\Delta k)\T (\lambda I) (\Delta
 can be generalized to $(\Delta k)\T \Lambda (\Delta k)$, where $\Lambda$ is an arbitrary positive definite matrix.
 This allows incorporation of different scales for different variables with little added implementation complexity.
 
-The main benefit of the regularization is the prior belief over the nature of the change in dynamics parameters it represents. By postulating a prior belief that the dynamics parameters change in a certain way, less demand is put on the data required for identification. The identification process will thus not interfere with operation in the same way as if noise would be added to the input for identification purposes. This allows learning of flexible, over-parametrized models that fit available data well. This makes the proposed identification methods attractive in applications such as guided policy search (GPS)~\cite{levine2013guided, levine2015learning} and non-linear iterative learning control (ILC)~\cite{bristow2006survey}, where they can lead to dramatically decreased sample complexity.
-
-
-The Kalman-filter based identification methods can be generalized to solving optimization problems where the argument in the regularizer appearing in \labelcref{eq:smooth} is replaced for an arbitrary linear operation on the parameter vector, $P(z)k$. The extension required is to factor out the polynomial $z-1$ from the polynomial $P(z)$, which for~\labelcref{eq:smooth} equals $z^2 - 2z + 1 = (z-1)^2$, and augmenting the dynamical model describing the parameter evolution with a state corresponding to the polynomial $Q(x) = (z-1)^{-1}P(z)$. \cmt{Expand and put in theory section.}
 
 
 The drawbacks of LTV models include situations where the observed time variation comes from non-linear effects. Non-linearities in the system may cause an apparent time-variation as the state varies. While an LTV-model can capture this variation, it will fail to generalize well to a subsequent experiment where the states take a significantly different trajectory. Many non-linear systems are, however, approximately \emph{locally} linear, such that they are well described by a linear model in a small neighborhood around the linearization/operating point. LTV models can thus be efficiently used for model-based reinforcement learning with variants of trust-region based optimization~\cite{schulman2015trust}.
-- 
GitLab