[nexa] Preventing undesirable behavior of intelligent machines

Nov. 27, 2019

      Intelligent machines using machine learning algorithms are ubiquitous,
ranging from simple data analysis and pattern recognition tools to
complex systems that achieve superhuman performance on various tasks.
Ensuring that they do not exhibit undesirable behavior—that they do
not, for example, cause harm to humans—is therefore a pressing
problem. We propose a general and flexible framework for designing
machine learning algorithms. This framework simplifies the problem of
specifying and regulating undesirable behavior. To show the viability
of this framework, we used it to create machine learning algorithms
that precluded the dangerous behavior caused by standard machine
learning algorithms in our experiments. Our framework for designing
machine learning algorithms simplifies the safe and responsible
application of machine learning. [...]

Let D, called the data, be the input to the ML algorithm. For example,
in the classification setting, D is not a single labeled training
example but rather all of the available labeled training examples. D
is a random variable and the source of randomness in our subsequent
statements regarding probability. An ML algorithm is a function a,
where a(D) is the solution output by the algorithm when trained on
data D. Let Θ be the set of all possible solutions that an ML
algorithm could output. Our framework mathematically defines what an
algorithm should do in a way that allows the user to directly place
probabilistic constraints on the solution, a(D), returned by the
algorithm. This differs from the standard ML approach wherein the user
can only indirectly constrain a(D) by restricting or modifying the
feasible set Θ or objective function f. Concretely, algorithms
constructed using our framework are designed to satisfy constraints of
the form Pr(g(a(D)) ≤ 0) ≥ 1 – δ, where g: Θ → ℝ defines a measure of
undesirable behavior (as illustrated later by example) and δ ∈ [0, 1]
limits the admissible probability of undesirable behavior.

Using our framework for designing ML algorithms involves three steps:

1) Define the goal for the algorithm design process. [...]
   Note that this is in contrast to the standard ML approach:
   In the standard ML approach, Eq. 1 defines the goal of the
   algorithm, which is to produce a solution with a given set of
   properties, whereas in our framework, Eq. 2 defines the
   goal of the designer, which is to produce an algorithm with
   a given set of properties. [...]
2) Define the interface that the user will use. The user should
   have the freedom to specify one or more gi that capture the
   user’s own definition of undesirable behavior.[...]
3) Create the algorithm. [...] In practice, designers rarely
   produce algorithms that cannot be improved upon, which
   implies that they may only find approximate solutions to
   Eq. 2. Our framework allows for this by requiring a to satisfy
   only the probabilistic constraints while attempting to optimize f;
   we call such algorithms Seldonian. [...]

The Seldonian algorithms and applications we present below are
illustrations to show that it is possible and tractable to design
Seldonian algorithms that can tackle important problems of interest.
[...]

We must therefore provide the user with a way to tell our algorithm
the statistic to be bounded, without requiring the user to provide the
value, g(θ), of the statistic for different solutions θ (see step 2
above). To achieve this (14), we allow the user to specify a sample
statistic g^(θ,D), and we define g(θ) to be the expected value of this
sample statistic: g(θ) = E[g^(θ,D)], where E denotes expected value.

Continua su https://science.sciencemag.org/content/366/6468/999

___

Alcune osservazioni da profano (sperando di essere corretto ed
imparare qualcosa di nuovo :-D)

La fairness del modello selezionato è

- probabile, per cui questo framework non può essere considerato una garanzia
- definita sulla base di una o più funzioni (gi) definite ex-ante (la
cui definizione è sotto il controllo del designer)
- tali funzioni sono a loro volta statistiche (per esempio "l'errore
medio nelle predizioni per uomini è entro un ε = 0.05 da quello delle
predizioni per le donne").

Se ho capito, l'ottimizzazione Seldoniana
...
argmax a ∈ Af(a)s.t. ∀i∈{1,...,n},Pr(gi(a(D))≤0)≥1−δi
seleziona gli algoritmi che minimizzano la probabilità di ciascuno dei
vincoli desiderati in isolamento (entro un certo errore), ma non è
detto che minimizzino le combinazioni degli stessi (ad esempio
potrebbero minimizzare il bias MEDIO sul sesso e il bias MEDIO
sull'etnia o il bias MEDIO sull'età, ma svantaggiare o avvantaggiare
comunque le donne anziane di etnia rom).

La questione è rilevante perché l'articolo prevede di lasciare
all'utente la scelta delle definizioni di "fairness" rilevanti FRA
quelle fornite dal progettista.

Di conseguenza questo interessantissimo approccio (che sostanzialmente
rende probabili alcuni meta-caratteristiche "secondarie" durante la
selezione della funzione che approssima l'obiettivo ignoto) non può
essere usato per produrre decisioni sull'uomo in quanto comunque
imperscrutabile.

Giacomo

[nexa] Preventing undesirable behavior of intelligent machines

Giacomo Tesio