Anima Anandkumar: The road to autonomy: sample-efficient learning in control systems with built-in safety and stability guarantees
- Shared screen with speaker view

Boaz Barak

16:32

If you didn

Boaz Barak

16:53

As I mentioned initially, feel free to ask questions on the chat during the talk

Boaz Barak

24:13

Is the difference between AlphaGo and Robotics is that experiments are much more expensive in robotics?

Hemang Purohit

26:40

I thinks its because AlphaGo has fixed environment while Robot operates in dynamic environment

Helen H. Yu

27:16

What is the difference between how people make decisions and how robots make decisions?

Vincent Zhang

28:11

difference in discrete-time VS continuous-time as well I guess

Hemang Purohit

28:30

Yeah true

Mark Kong

30:25

Is there a theorem that says something like "there aren't adversarial examples for things like AlphaGo", or is that just a heuristic?

Hemang Purohit

32:49

Hi Anima, thank you for the talk, what do you think is abetter approach, using end-to-end learning in Robtotics or using existing method and making them more adaptable (hybrid = traditional + learning) ?

Hemang Purohit

33:09

Thank you

Vincent Zhang

34:14

In control, we typically prefer a smaller time duration, eg. to respect inherent system frequency; whereas in RL, smaller time duration may cause troubles in learning, eg. Q function collapses etc. Seems to be some tradeoff there ;)

Hemang Purohit

52:33

This is great, thank you making this talk online, love it

Sanket

53:07

What are the sources of uncertainty in these models?

Sanket

54:10

But the model isn’t inherently stochastic?

Nikhil

55:53

Are there significant “device specific” uncertainties? Different sensor errors etc. If so, do you think Adaptive Learning will be able to perform better in such scenarios as opposed to just RL or just Control Systems?

Boaz Barak

57:01

Let's defer this to the next paue?

Nikhil

57:43

sure, Thanks.

Csaba

59:12

sound gone?

Vincent Zhang

59:32

Not for me

Hemang Purohit

59:52

sound is good for me

Feicheng Wang

01:00:11

Why is the regret defined w.r.t sensors y_t instead of internal states x_t?

Csaba

01:00:59

is there a price for this assumption?

Sahin Lale

01:02:49

in LQG with a similarity transformation you can change the state representation but the output of the system would be the same. That’s why we picked the cost based on outputs rather than the internal state

Yang Zheng

01:10:30

What is the assumption for the open-loop dynamics? Is it open-loop stable or unstable?

Sahin Lale

01:10:53

open-loop stable

Feicheng Wang

01:11:14

Is there anyway possible way to relax stable assumption?

Yang Zheng

01:11:30

thanks! I think it is much harder to deal with open-loop unstable systems

Yang Zheng

01:12:23

I feel open-loop unstable systems are fundamentally harder to deal with...

Lucas Janson

01:14:22

Do your polylog regret bounds for LQG imply you can achieve polylog regret for the LQR problem as a special case?

J

01:16:02

^I think so, due to the separation principle

Sahin Lale

01:16:19

In LQR, the lack of noise in the observations make the optimal controller who knows the system much stronger

Sahin Lale

01:17:18

So it won’t extend to LQR

Lucas Janson

01:17:34

OK thank you!

Helen H. Yu

01:19:21

isnle

Yang Zheng

01:20:54

does this setup also work for LQR?

Yang Zheng

01:21:16

what if C = I?

Yang Zheng

01:24:28

thanks

Yang Zheng

01:26:54

Another question: which part in your procedure requires A to be stable, and why?

Sahin Lale

01:28:23

for the warm-up part, we inject independent Gaussian noise to get a initial good estimate for adaptive control

Guangyi Liu

01:28:27

Is the combination of learn control policy online and learn the model episodically optimal?

Sahin Lale

01:29:43

this part requires A to be stable, one method would be assuming that a stabilizing controller is given to us during warm-up. With this assumption we can also handle unstable A but it leads to the same idea of assuming A being stable

Feicheng Wang

01:30:27

Later you made another assumption that \bar{A} is be stable? Is that assumption usually more likely to hold than A itself stable?

Sahin Lale

01:30:57

\bar{A} stability comes from the assumption of controllability and observability

Sahin Lale

01:31:05

so yes :)

Feicheng Wang

01:31:23

I see

Feicheng Wang

01:31:30

Thank you!

Csaba Szepesvari

01:32:35

So do I correctly understand that the length of the warmup phase depends on unknown quantities? Any how of removing those?

Sahin Lale

01:35:05

yes it depends on the unknown quantities to maintain the stability of the designed controller and improvement in performance. Unfortunately it is hard to remove the warm-up step in adaptive control of LQG. We thought of some possible ways but still ongoing work

Csaba Szepesvari

01:36:27

Thanks.

Yang Zheng

01:36:28

do you assume the noise w_i is known or directly measurable?

Sahin Lale

01:39:31

to recap, none of the noise realizations are observed. We estimate the effect of noises as nature’s output using the markov parameter estimates

Ron Rivest

01:39:44

How might this approach apply to learning/controlling the climate?

Ron Rivest

01:41:26

YEs, thanks.

Csaba Szepesvari

01:42:52

I guess a technical question is how u_t can be computed if it depends on b_t which depends on w_t which is not observed..

Sahin Lale

01:44:02

all outputs can be written in terms of nature’s y + markov parameters times the previous inputs

Csaba Szepesvari

01:44:26

Yeah.. which was not on the slides before.. (or maybe it was I just missed it)

Sahin Lale

01:44:35

since we know the past inputs, we use markov parameter estimates to estimate the nature’s y

Csaba Szepesvari

01:44:57

Is there an observer working somewhere?

Csaba Szepesvari

01:45:07

w_t is state noise..

Csaba Szepesvari

01:45:33

:)

Sahin Lale

01:45:57

this doesn’t remove the dependence on the history

Csaba Szepesvari

01:46:11

Ok, I guess it’s coming here..

Sahin Lale

01:47:23

in the controller design we use M and the estimate of nature’s y which captures the effect of all noise that is observed in past

Kaiqing Zhang

01:47:27

@csaba I guess this Kalman filter is the observer you mentioned

Dongsheng Ding

01:50:05

Do we pass the input persistent excitation to the output PE?

Csaba Szepesvari

01:50:44

@Kaiqing: yes..

Sahin Lale

01:51:07

yes we obtain M such that it satisfies this

Sahin Lale

01:52:06

we assume that the noises in the system have full rank covariance matrix

Sahin Lale

01:52:23

this combination would achieve PE

Csaba Szepesvari

01:53:37

Are we still injecting noise here in this phase?

Sahin Lale

01:54:00

after warm-up we stop injecting noise

Csaba Szepesvari

01:54:09

Ok, this is what I thought

Csaba Szepesvari

01:55:02

The inevitable question is whether identification will be possible without PE. And if not, how come we don’t pay for this in the regret.

Csaba Szepesvari

01:55:58

I guess, FTRL in the presence of nice structure achieves low regret..

Csaba Szepesvari

01:56:13

Very nice!

Cathy Wu

01:56:20

What is “PE”?

Cathy Wu

01:56:33

Ah got it, thank you

Csaba Szepesvari

01:57:24

Oh this was interes

Csaba Szepesvari

01:57:26

ting

Csaba Szepesvari

01:57:52

Is there a way to remove the assumption that the optimal controller is PE? Or does this happen pretty much all the time?

Dongsheng Ding

01:59:07

There is a rich literature on recursive LS. Can we draw connection to it? They could achieve logarithmic regret with PE.

Sahin Lale

02:00:09

it is a little tricky, it is usually the case especially in LQG control since it is constructed by bunch of matrices. It is hard to explain it in chat :) but in the paper we define explicitly what is required to obtain PE for the optimal control of the underlying model

Csaba Szepesvari

02:00:18

On this slide, is f known?

Sahin Lale

02:00:38

However, in LQR it is not possible since K is usually rank-deficient

Anqi Liu

02:02:23

yes, f is known and \tilde{f} is unknown

Csaba Szepesvari

02:02:45

Thanks Sahin for all the answers:) @anqi: This would make sense to me

Sahin Lale

02:03:28

@all Thanks a lot for all insightful questions :)

Anirudh Ramesh

02:12:12

Thank you so much for this amazing talk. I'm an incoming PhD student in decision making under uncertainty. What book would you recommend to an aspiring researcher in the intersection of optimal control and reinforcement learning?

Chris

02:16:01

thank you, Prof Anima!

Sahin Lale

02:16:13

Bandit algorithms by Csaba Szepesvari and Tor Lattimore is a great reading to get the general idea in decision making under uncertainty

Csaba Szepesvari

02:16:31

Spoiler alert ^^^ :)

Hemang Purohit

02:16:34

Thank you Dr Anima, that was a very insightful talk

Csaba Szepesvari

02:17:01

It is a small subset

Hemang Purohit

02:17:43

is there any caltech resource online learning about these methods ?

Csaba Szepesvari

02:18:24

Books by Bertsekas:)

Sahin Lale

02:18:37

cannot agree more :)

Anirudh Ramesh

02:18:39

What would it take for RL-based systems (in terms of robustness) be applied to safety-critical systems like autonoumous vehicles? Where do you see this going field going in the future/

Anirudh Ramesh

02:20:54

Thank you so much for this talk, it was really inspiring to me, as an aspiring researcher

Csaba Szepesvari

02:20:59

Clap-clap!!

Csaba Szepesvari

02:21:16

Thank you guys for the seminar!

Wei Lu

02:21:20

Thank you!

Vincent Zhang

02:21:23

Claps!! Awesome work. Thanks for the talk

Hemang Purohit

02:21:34

Thanks for the talk