MathDB

Last year, Master Cheung is famous for multi-rotation. This year, he comes to DAMO to make noodles for sweeping monk. One day, software engineer Xiao Li talks with Master Cheung about his job. Xiao Li mainly researches and designs the algorithm to adjust the paramter of different kinds of products. These paramters can normally be obtainly by minimising loss function

f

\mathbb{R}^n

. In the recent project of Xiao Li, this loss function is obtained by other topics. For safety consideration and technique reasons, this topic makes Xiao Li difficult to find the interal details of the function. They only provide a port to calculate the value of

f(\text x)

for any

\text x\in\mathbb{R}^n

. Therefore, Xiao Li must only use the value of the function to minimise

f

. Also, every times calculating the value of

f

will use a lot of calculating resources. It is good to know that the dimension

n

is not very high (around

10

). Also, colleague who provides the function tells Xiao Li to assume

f

is smooth first. This problem reminds Master Cheung of his antique radio. If you want to hear a programme from the radio, you need to turn the knob of the radio carefully. At the same time, you need to pay attention to the quality of the radio received, until the quality is the best. In this process, no one knows the relationship between the angle of turning the knob and the quality of the radio received. Master Cheung and Xiao Li realizes that minimising

f

is same as adjusting the machine with multiple knobs: Assume every weight of

\text x

is controlled by a knob.

f(\text x)

is a certain performance of the machine. We only need to adjust every knobs again and again and observes the value of

f

in the same time. Maybe there is hope to find the best

\text x

. As a result, two people suggest an iteration algorithm (named Automated Forward/Backward Tuning,

\text{AFBT}

, to minimise

f

. In

k

-th iteration, the algorithm adjusts the individual weight of

\text{x}_k

2n

points

\{\text x_k\pm t_k\text e^i:i=1,...,n\}

, where

t_k

is the step size; then, make

y_k

be the smallest one among the value of the function of thosse points. Then check if

\text y_k

sufficiently makes

f

decrease; then, take

\text x_{k+1}=\text y_k

, then make the step size doubled. Otherwise, make

\text x_{k+1}=\text x_k

and makes the step size decrease in half. In the algorithm,

\text e^i

is the

i

-th coordinate vector in

\mathbb{R}^n

. The weight of

i

-th is

1

. Others are

0

;

\mathbf{1}(\cdot)

is indicator function. If

f(\text x_k)-f(\text y_k)

is at least the square of

t_k

, then take the value of

\mathbf{1}(f(\text k)-f(y_k)\ge t^2_k)

1

. Otherwise, take it as

0

\text{AFBT}

algorithm Input

\text{x}_0\in \mathbb{R}^n

t_0>0

. For

k=0, 1, 2, ...

, perform the following loop: 1: #Calculate loss function. 2:

s_k:=\mathbb{1}[f(\text{x}_k)-f(\text{y}_k)\ge t^2_k]

#Is it sufficiently decreasing? Yes:

s_k=1

; No:

s_k=0

. 3:

\text{x}_{k+1}:=(1-s_k)\text{x}_k+s_k\text{y}_k

#Update the point of iteration. 4:

t_{k+1}:=2^{2S_k-1}t_k

#Update step size.

s_k=1

: Step size doubles;

s_k=0

: Step size decreases by half.
Now, we made assumption to the loss function

f:\mathbb{R}^n\to \mathbb{R}

. Assumption 1. Let

f

be a convex function. For any

\text{x}, \text{y}\in \mathbb{R}^n

and

\alpha \in [0, 1]

, we have

f((1-\alpha)\text{x}+\text{y})\le (1-\alpha)f(\text{x})+\alpha f(\text{y})

. Assumption 2.

f

is differentiable on

\mathbb{R}^n

and

\nabla f

is L-Lipschitz continuous on

\mathbb{R}^n

. Assumption 3. The level set of

f

is bounded. For any

\lambda\in\mathbb{R}

, set

\{\text x\in \mathbb{R}^n:f(\text x)\le \lambda\}

is all bounded. Based on assumption 1 and 2, we can prove that

\left\langle \nabla f(\text x),\text y-\text x \right\rangle \le f(\text y)-f(\text x)\le \left\langle \nabla f(\text x),\text y-\text x\right\rangle+\frac{L}{2}||\text x-\text y||^2

You can refer to any convex analysis textbook for more properties of convex function. Prove that under the assumption 1-3, for

AFBT

\lim_{k \to \infty}f(\text{x}_k)=f^*

Problem Statement