Zoom Logo

Anima Anandkumar: The road to autonomy: sample-efficient learning in control systems with built-in safety and stability guarantees - Shared screen with speaker view
Boaz Barak
16:32
If you didn
Boaz Barak
16:53
As I mentioned initially, feel free to ask questions on the chat during the talk
Boaz Barak
24:13
Is the difference between AlphaGo and Robotics is that experiments are much more expensive in robotics?
Hemang Purohit
26:40
I thinks its because AlphaGo has fixed environment while Robot operates in dynamic environment
Helen H. Yu
27:16
What is the difference between how people make decisions and how robots make decisions?
Vincent Zhang
28:11
difference in discrete-time VS continuous-time as well I guess
Hemang Purohit
28:30
Yeah true
Mark Kong
30:25
Is there a theorem that says something like "there aren't adversarial examples for things like AlphaGo", or is that just a heuristic?
Hemang Purohit
32:49
Hi Anima, thank you for the talk, what do you think is abetter approach, using end-to-end learning in Robtotics or using existing method and making them more adaptable (hybrid = traditional + learning) ?
Hemang Purohit
33:09
Thank you
Vincent Zhang
34:14
In control, we typically prefer a smaller time duration, eg. to respect inherent system frequency; whereas in RL, smaller time duration may cause troubles in learning, eg. Q function collapses etc. Seems to be some tradeoff there ;)
Hemang Purohit
52:33
This is great, thank you making this talk online, love it
Sanket
53:07
What are the sources of uncertainty in these models?
Sanket
54:10
But the model isn’t inherently stochastic?
Nikhil
55:53
Are there significant “device specific” uncertainties? Different sensor errors etc. If so, do you think Adaptive Learning will be able to perform better in such scenarios as opposed to just RL or just Control Systems?
Boaz Barak
57:01
Let's defer this to the next paue?
Nikhil
57:43
sure, Thanks.
Csaba
59:12
sound gone?
Vincent Zhang
59:32
Not for me
Hemang Purohit
59:52
sound is good for me
Feicheng Wang
01:00:11
Why is the regret defined w.r.t sensors y_t instead of internal states x_t?
Csaba
01:00:59
is there a price for this assumption?
Sahin Lale
01:02:49
in LQG with a similarity transformation you can change the state representation but the output of the system would be the same. That’s why we picked the cost based on outputs rather than the internal state
Yang Zheng
01:10:30
What is the assumption for the open-loop dynamics? Is it open-loop stable or unstable?
Sahin Lale
01:10:53
open-loop stable
Feicheng Wang
01:11:14
Is there anyway possible way to relax stable assumption?
Yang Zheng
01:11:30
thanks! I think it is much harder to deal with open-loop unstable systems
Yang Zheng
01:12:23
I feel open-loop unstable systems are fundamentally harder to deal with...
Lucas Janson
01:14:22
Do your polylog regret bounds for LQG imply you can achieve polylog regret for the LQR problem as a special case?
J
01:16:02
^I think so, due to the separation principle
Sahin Lale
01:16:19
In LQR, the lack of noise in the observations make the optimal controller who knows the system much stronger
Sahin Lale
01:17:18
So it won’t extend to LQR
Lucas Janson
01:17:34
OK thank you!
Helen H. Yu
01:19:21
isnle
Yang Zheng
01:20:54
does this setup also work for LQR?
Yang Zheng
01:21:16
what if C = I?
Yang Zheng
01:24:28
thanks
Yang Zheng
01:26:54
Another question: which part in your procedure requires A to be stable, and why?
Sahin Lale
01:28:23
for the warm-up part, we inject independent Gaussian noise to get a initial good estimate for adaptive control
Guangyi Liu
01:28:27
Is the combination of learn control policy online and learn the model episodically optimal?
Sahin Lale
01:29:43
this part requires A to be stable, one method would be assuming that a stabilizing controller is given to us during warm-up. With this assumption we can also handle unstable A but it leads to the same idea of assuming A being stable
Feicheng Wang
01:30:27
Later you made another assumption that \bar{A} is be stable? Is that assumption usually more likely to hold than A itself stable?
Sahin Lale
01:30:57
\bar{A} stability comes from the assumption of controllability and observability
Sahin Lale
01:31:05
so yes :)
Feicheng Wang
01:31:23
I see
Feicheng Wang
01:31:30
Thank you!
Csaba Szepesvari
01:32:35
So do I correctly understand that the length of the warmup phase depends on unknown quantities? Any how of removing those?
Sahin Lale
01:35:05
yes it depends on the unknown quantities to maintain the stability of the designed controller and improvement in performance. Unfortunately it is hard to remove the warm-up step in adaptive control of LQG. We thought of some possible ways but still ongoing work
Csaba Szepesvari
01:36:27
Thanks.
Yang Zheng
01:36:28
do you assume the noise w_i is known or directly measurable?
Sahin Lale
01:39:31
to recap, none of the noise realizations are observed. We estimate the effect of noises as nature’s output using the markov parameter estimates
Ron Rivest
01:39:44
How might this approach apply to learning/controlling the climate?
Ron Rivest
01:41:26
YEs, thanks.
Csaba Szepesvari
01:42:52
I guess a technical question is how u_t can be computed if it depends on b_t which depends on w_t which is not observed..
Sahin Lale
01:44:02
all outputs can be written in terms of nature’s y + markov parameters times the previous inputs
Csaba Szepesvari
01:44:26
Yeah.. which was not on the slides before.. (or maybe it was I just missed it)
Sahin Lale
01:44:35
since we know the past inputs, we use markov parameter estimates to estimate the nature’s y
Csaba Szepesvari
01:44:57
Is there an observer working somewhere?
Csaba Szepesvari
01:45:07
w_t is state noise..
Csaba Szepesvari
01:45:33
:)
Sahin Lale
01:45:57
this doesn’t remove the dependence on the history
Csaba Szepesvari
01:46:11
Ok, I guess it’s coming here..
Sahin Lale
01:47:23
in the controller design we use M and the estimate of nature’s y which captures the effect of all noise that is observed in past
Kaiqing Zhang
01:47:27
@csaba I guess this Kalman filter is the observer you mentioned
Dongsheng Ding
01:50:05
Do we pass the input persistent excitation to the output PE?
Csaba Szepesvari
01:50:44
@Kaiqing: yes..
Sahin Lale
01:51:07
yes we obtain M such that it satisfies this
Sahin Lale
01:52:06
we assume that the noises in the system have full rank covariance matrix
Sahin Lale
01:52:23
this combination would achieve PE
Csaba Szepesvari
01:53:37
Are we still injecting noise here in this phase?
Sahin Lale
01:54:00
after warm-up we stop injecting noise
Csaba Szepesvari
01:54:09
Ok, this is what I thought
Csaba Szepesvari
01:55:02
The inevitable question is whether identification will be possible without PE. And if not, how come we don’t pay for this in the regret.
Csaba Szepesvari
01:55:58
I guess, FTRL in the presence of nice structure achieves low regret..
Csaba Szepesvari
01:56:13
Very nice!
Cathy Wu
01:56:20
What is “PE”?
Cathy Wu
01:56:33
Ah got it, thank you
Csaba Szepesvari
01:57:24
Oh this was interes
Csaba Szepesvari
01:57:26
ting
Csaba Szepesvari
01:57:52
Is there a way to remove the assumption that the optimal controller is PE? Or does this happen pretty much all the time?
Dongsheng Ding
01:59:07
There is a rich literature on recursive LS. Can we draw connection to it? They could achieve logarithmic regret with PE.
Sahin Lale
02:00:09
it is a little tricky, it is usually the case especially in LQG control since it is constructed by bunch of matrices. It is hard to explain it in chat :) but in the paper we define explicitly what is required to obtain PE for the optimal control of the underlying model
Csaba Szepesvari
02:00:18
On this slide, is f known?
Sahin Lale
02:00:38
However, in LQR it is not possible since K is usually rank-deficient
Anqi Liu
02:02:23
yes, f is known and \tilde{f} is unknown
Csaba Szepesvari
02:02:45
Thanks Sahin for all the answers:) @anqi: This would make sense to me
Sahin Lale
02:03:28
@all Thanks a lot for all insightful questions :)
Anirudh Ramesh
02:12:12
Thank you so much for this amazing talk. I'm an incoming PhD student in decision making under uncertainty. What book would you recommend to an aspiring researcher in the intersection of optimal control and reinforcement learning?
Chris
02:16:01
thank you, Prof Anima!
Sahin Lale
02:16:13
Bandit algorithms by Csaba Szepesvari and Tor Lattimore is a great reading to get the general idea in decision making under uncertainty
Csaba Szepesvari
02:16:31
Spoiler alert ^^^ :)
Hemang Purohit
02:16:34
Thank you Dr Anima, that was a very insightful talk
Csaba Szepesvari
02:17:01
It is a small subset
Hemang Purohit
02:17:43
is there any caltech resource online learning about these methods ?
Csaba Szepesvari
02:18:24
Books by Bertsekas:)
Sahin Lale
02:18:37
cannot agree more :)
Anirudh Ramesh
02:18:39
What would it take for RL-based systems (in terms of robustness) be applied to safety-critical systems like autonoumous vehicles? Where do you see this going field going in the future/
Anirudh Ramesh
02:20:54
Thank you so much for this talk, it was really inspiring to me, as an aspiring researcher
Csaba Szepesvari
02:20:59
Clap-clap!!
Csaba Szepesvari
02:21:16
Thank you guys for the seminar!
Wei Lu
02:21:20
Thank you!
Vincent Zhang
02:21:23
Claps!! Awesome work. Thanks for the talk
Hemang Purohit
02:21:34
Thanks for the talk