analysis.tex

\section{Stochastic Analysis}
\label{sec:analysis}

Section~\ref{sec:design} introduced a control design technique that
exploits information about the probability of sequences of deadline
hits and misses for the control job. Here, we provide a framework to
robustly estimate these probabilities, and at the same time preserve
a pessimistic bound that allows us to mitigate the effect of
worst-case conditions. We formulate the estimation problem as a
chance-constrained optimization problem~\cite{miller1965CCS}, i.e.,
an optimization problem where we look for the probabilities of
different sequences of hits and misses given the worst-case 
realization of the uncertainty inherently present in the taskset execution.

Analytical approaches extracting the probability of hits and misses
for a schedule of jobs are either extremely
pessimistic~\cite{chen2017probabilistic} or have a high computational
complexity~\cite{von2018efficiently}. This limits the applicability
of these techniques in non-trivial cases. Moreover, there are few
works dealing with joint probabilities of consecutive jobs,
like~\cite{tanasa2015probabilistic}, but they still \st{suffer from limited} 
\textcolor{red}{lack of (SE RIFORMULATO COSI' POSSIAMO RISPARMIARE UNA RIGA)}scalability.

To handle the scalability issue, we adopt a simulation-based
approach, backed up by the \emph{scenario
theory}~\cite{calafiore2006scenario}, that \emph{empirically}
performs the uncertainty characterization, and provides
\emph{formal guarantees} on the robustness of the resulting
estimation. The scenario theory \textcolor{red}{allows to exploit}
\st{is capable of exploiting} the fact that simulating the taskset 
execution (with statistical significance) is less computationally 
expensive than an analytical approach that incurs into the problem of combinatorial explosion of the different possible uncertainty 
realizations. In practice, this means that we: (i) \st{randomly 
extract} \textcolor{red}{sample the} execution times from the 
probability distributions specified for each 
task, $f_i^{\mathcal{C}}(c)$, (ii) schedule the tasks, checking the 
resulting set of sequences $\Omega$, and (iii) find the worst-case 
sequence $\omega_*$ based on the chosen cost function. 
The probabilities of sequences of hits and misses are
then computed based on this sequence, and used in the design of
the controller to be robust with respect to the sequence. We use the
scenario theory to quantify the probability $\varepsilon$ of not having 
extracted the \emph{true} worst-case sequence and the  confidence in the 
process $1-\beta$ \textcolor{red}{according to the number of extracted samples}.

\subsection{Scenario Theory}
\label{sec:analysis:scenario}

The Scenario Theory has been developed in the field of robust
control~\cite{calafiore2006scenario} to provide robustness guarantees
for convex optimization problems in presence of probabilistic
uncertainty. A characteristic of these problems is that accounting
for all the possible uncertainty realization might be achieved
analytically, but is computationally too heavy or results in
pessimistic bounds. The Scenario Theory proposes an empirical method
in which samples are drawn from the possible realizations of
uncertainty. \textcolor{red}{By providing a lower bound on the number of 
samples to be drawn from the uncetainty space} it provides statistical 
guarantees \textcolor{red}{on the value of the cost function} with 
respect to the general case, provided that the sources of uncertainty 
are the same. 

One of the advantages of this approach is that there is no need to
enumerate the uncertainty sources, the only requirement being the
possibility to draw representative samples. This eliminates the need
to make assumptions on the correlation between the probability of
deadline miss in subsequent jobs. If interference is happening
between the jobs, this interference empirically appears when the
system behavior is sampled. While there is no requirement on
subsequent jobs interfering with one another, there is a requirement
that different sequences are independent (i.e., each sequence
represents an execution of the entire taskset of a given length, in
the same or possibly different conditions). Taking the worst observed
case in a set of experiments, the Scenario Theory allows us to
estimate the probability that something worse than what is observed
can happen during the execution of the system.

Specifically, for a sequence $\omega$ we define a cost function
$J_{seq}(\omega)$, that determines when we consider a sequence worse
than another (from the perspective of the controller execution).
Denoting with $\mu_{\text{tot}}(\omega)$ the total number of job
skips and deadline misses that the control task experienced in
$\omega$, and with $\mu_{\text{seq}}(\omega)$ the \st{total} 
\textcolor{red}{maximum} number of consecutive deadline misses or 
skipped jobs in $\omega$, we use as a cost function
\begin{equation}\label{eq:Jseq}
  J_{seq}(\omega) = \mu_{\text{tot}}(\omega)\,\mu_{\text{seq}}(\omega)
\end{equation}
to determine the worst-case sequence of hits and misses. Given a set
of simulated sequences $\Omega = \{ \omega_1, \dots
\omega_{n_{\text{sim}}} \}$, we select $\omega_* =
\text{arg}\,\max\limits_{\omega \in \Omega}J_{seq}(\omega)$. The
number of simulations, $n_{\text{sim}}$ is selected based on the
scenario approach, and provides probabilistic bounds on the
uncertainty realization, giving us \st{some} formal guarantees on the
design \textcolor{red}{according to the chosen cost function}.
\textcolor{red}{
The choice of the cost function is anyhow not-univocal. For instance the 
number of sub-sequences of a given length with at least a given number of 
deadline misses or the shortest subsequence with more than a given number 
deadline misses would be other viable choices.
}

\subsection{Formal Guarantees}
\label{sec:analysis:guarantees}

The Scenario Theory allows us to compute the number $n_{\text{sim}}$
of simulations that we need to conduct to reach the required
robustness $\varepsilon$ and confidence $1-\beta$. The parameter
$\varepsilon$ is a bound on the probability of the obtained result
being wrong, i.e., on the probability that another simulation would
lead to a sequence with a higher cost function $J_{seq}$ than
$\omega_*$. The parameter $1-\beta$ represents the confidence we have
in this result, i.e., the probability of $\varepsilon$ being an
incorrect bound. It can also be interpreted as the probability that
the drawn $n_{\text{seq}}$ sequences are representative enough of the
whole set of possible uncertainty realizations.

Equation~\eqref{sec:scenario:testnumber} shows the relation between
the number of experiments $n_{\text{sim}}$, $\varepsilon$ and
$\beta$~\cite{calafiore2006scenario}. Here, $d$ is the number of
optimization variables used for the selection. The cost function
$J_{seq}$ that we defined takes as argument only a sequence $\omega$,
hence $d=1$.
\begin{equation}
\sum\limits_{i=0}^{d-1} \binom{n_{\text{sim}}}{i} \varepsilon^i (1-\varepsilon)^{n_{\text{sim}}-i} \leq \beta .
\label{sec:scenario:testnumber}
\end{equation}
Specifying $\beta$ and $\varepsilon$ univocally determines
$n_{\text{sim}}$. If $\beta$ and $\varepsilon$ are sufficiently
small, we can use the worst-case sequence for the design of the
controller with high confidence.

\subsection{Application and Threats to Validity}
\label{sec:analysis:application}

Similarly to any other empirical approach, the validity of the
Scenario Theory depends on the representativeness of the sampling
set. In our case, for example the validity of our results depends on
the significance of the probabilistic execution time distributions
for all the tasks.

Furthermore, the length of the simulations is a critical parameter.
We simulate the system for a number $n_\text{job}$ of executions of
the control task. Clearly, we want to select $n_\text{job}$ to cover
an entire hyperperiod (to achieve complete analysis of the
interferences between the tasks). In practice, we want to be able to
detect cascaded effects \textcolor{red}{that might happen due to the 
probabilistic nature of the execution times of the tasks. Some samplings
could in fact make the utilization of instances of the taskset greater 
than one. For this reason} \st{, so} simulations that include several 
hyperperiods should be performed. On top of that significancy with 
respect the controlled of the physical system is required \textcolor{red}{(since the existance of the hyperperiod is not always guaranteed)}, hence 
the length of the simulated sequences should cover its dynamics.