DERIVATION OF BELLMAN,S PDE-DYNAMIC PROGRAMMING

الرياضيات

عدد المواضيع في هذا القسم 9761 موضوعاً

تاريخ الرياضيات

الرياضيات المتقطعة

الجبر

الضبابية

نظرية المجموعات

نظرية الزمر

نظرية الحلقات والحقول

نظرية الاعداد

نظرية الفئات

حساب المتجهات

المتتاليات-المتسلسلات

المصفوفات و نظريتها

المثلثات

الهندسة

التفاضل و التكامل

المعادلات التفاضلية و التكاملية

التحليل

التبلوجيا

نظرية الالعاب

الاحتمالات و الاحصاء

نظرية التحكم

بحوث العمليات

نظرية الكم

الشفرات

الرياضيات التطبيقية

نظريات ومبرهنات

علماء الرياضيات

الرياضيات في العلوم الاخرى

بحوث و اطاريح جامعية

هل تعلم

طرائق التدريس

الرياضيات العامة

نظرية البيان

Untitled Document

أبحث عن شيء أخر المرجع الالكتروني للمعلوماتية

أضيف حديثاً

نضج وحصاد وتداول الثوم

2024-11-22

مواعيد زراعة الثوم

2024-11-22

مرحلـة تنميـة وتطويـر العـادة الاستهلاكيـة فـي سلـوك المـستهلـك

2024-11-22

مرحلة إيجاد العادة الشرائية فـي سـلوك المـستـهلك

2024-11-22

فسيولوجيا الثوم

2024-11-22

مـرحلة إيـجاد القـدرة علـى الشـراء فـي سـلوك المـستـهلك

2024-11-22

وثائقيات

العباءة العربية .. إرث الآباء واعتزاز الأبناء

التاريخ / 20-09-2024

مختارات

مجال شحنة نقطية متحركة بسرعة ثابتة

27-4-2016

اقتران تكعيبي Cubic Function

29-10-2015

المنافقون المختفون

29-09-2015

Regional characteristics of Australian English: phonology

2024-04-23

البروتينات الفوسفاتية Phospho Proteins

21-11-2020

إدخال ملكات النحل الى الخلية Introducing the queen

2024-05-26

DERIVATION OF BELLMAN,S PDE-DYNAMIC PROGRAMMING

355

01:29 مساءاً date: 16-10-2016

Author : Lawrence C. Evans

Book or Source : An Introduction to Mathematical Optimal Control Theory

Page and Part : 72-77

CONTROLLABILITY, BANG-BANG PRINCIPLE-QUICK REVIEW OF LINEAR ODE.

Date: 3-10-2016

310

CONTROLLABILITY, BANG-BANG PRINCIPLE-CONTROLLABILITY OF LINEAR EQUATIONS.

Date: 5-10-2016

395

LINEAR TIME-OPTIMAL CONTROL-THE MAXIMUM PRINCIPLE FOR LINEAR TIME-OPTIMAL CONTROL

Date: 8-10-2016

390

We begin with some mathematical wisdom: “It is sometimes easier to solve a problem by embedding it within a larger class of problems and then solving the larger class all at once.”

A CALCULUS EXAMPLE. Suppose we wish to calculate the value of the integral

This is pretty hard to do directly, so let us as follows add a parameter α into the integral:

where we integrated by parts twice to ﬁnd the last equality. Consequently

I(α) = −arctan α + C,

and we must compute the constant C. To do so, observe that

0 = I(∞) = −arctan(∞) + C = −π/2+ C,

and so C = π/2 . Hence I(α) = −arctan α + π/2 , and consequently

We want to adapt some version of this idea to the vastly more complicated setting of control theory. For this, ﬁx a terminal time T > 0 and then look at the controlled dynamics

with the associated payoﬀ functional

We embed this into a larger family of similar problems, by varying the starting times and starting points:

(1.1)

With

(1.2)

Consider the above problems for all choices of starting times 0 ≤ t ≤ T and all initial points x ∈ Rⁿ.

DEFINITION. For x ∈ Rⁿ, 0 ≤ t ≤ T, deﬁne the value function v(x, t) to be the greatest payoﬀ possible if we start at x ∈ Rⁿ at time t. In other words,

(1.3)

Notice then that

(1.4) v(x, T) = g(x) (x ∈ Rn).

1.2 DERIVATION OF HAMILTON-JACOBI-BELLMAN EQUATION.

Our ﬁrst task is to show that the value function v satisﬁes a certain nonlinear partial diﬀerential equation.

Our derivation will be based upon the reasonable principle that “it’s better to be smart from the beginning, than to be stupid for a time and then become smart”.

We want to convert this philosophy of life into mathematics.

To simplify, we hereafter suppose that the set A of control parameter values is compact.

THEOREM 1.2 DERIVATION OF HAMILTON-JACOBI-BELLMAN EQUATION.

Our ﬁrst task is to show that the value function v satisﬁes a certain nonlinear partial diﬀerential equation.

Our derivation will be based upon the reasonable principle that “it’s better to be smart from the beginning, than to be stupid for a time and then become smart”.

We want to convert this philosophy of life into mathematics.

To simplify, we hereafter suppose that the set A of control parameter values is compact.

THEOREM 1.1 (HAMILTON-JACOBI-BELLMAN EQUATION). Assume that the value function v is a C1 function of the variables (x, t). Then v solves the nonlinear partial diﬀerential equation1.1 (HAMILTON-JACOBI-BELLMAN EQUATION). Assume that the value function v is a C1 function of the variables (x, t). Then v solves the nonlinear partial diﬀerential equation

with the terminal condition

v(x, T) = g(x) (x ∈ Rⁿ).

REMARK.We call (HJB) the Hamilton–Jacobi–Bellman equation, and can rewrite it as

(HJB) vt(x, t) + H(x,∇_xv) = 0 (x ∈ Rⁿ, 0 ≤ t < T),

for the partial diﬀerential equations Hamiltonian

where x, p ∈ Rⁿ.

Proof. 1. Let x ∈ Rⁿ, 0 ≤ t < T and let h > 0 be given. As always

A = {α(.) : [0,∞) → A measurable}.

Pick any parameter a ∈ A and use the constant control

α(.) ≡ a

for times t ≤ s ≤ t + h. The dynamics then arrive at the point x(t + h), where t + h < T. Suppose now a time t + h, we switch to an optimal control and use it for the remaining times t + h ≤ s ≤ T.

What is the payoﬀ of this procedure? Now for times t ≤ s ≤ t + h, we have

The payoﬀ for this time period is

Furthermore, the payoﬀ incurred from time t + h to T is v(x(t + h), t + h), according to the deﬁnition of the payoﬀ function v. Hence the total payoﬀ is

But the greatest possible payoﬀ if we start from (x, t) is v(x, t). Therefore

(1.5)

2. We now want to convert this inequality into a diﬀerential form. So we rearrange (1.5) and divide by h > 0:

But x(.) solves the ODE

Employ this above, to discover:

v_t(x, t) + f (x, a) .∇_xv(x, t) + r(x, a) ≤ 0.

This inequality holds for all control parameters a ∈ A, and consequently

(1.6)

3. We next demonstrate that in fact the maximum above equals zero. To see this, suppose α^∗(.), x^∗(.) were optimal for the problem above. Let us utilize the optimal control α^∗(.) for t ≤ s ≤ t + h. The payoﬀ is

and the remaining payoﬀ is v(x^∗(t + h), t + h). Consequently, the total payoﬀ is

for some parameter value a^∗ ∈ A. This proves (HJB).

1.3 THE DYNAMIC PROGRAMMING METHOD

Here is how to use the dynamic programming method to design optimal controls:

Step 1: Solve the Hamilton–Jacobi–Bellman equation, and thereby compute the value function v.

Step 2: Use the value function v and the Hamilton–Jacobi–Bellman PDE to design an optimal feedback control α^∗(.), as follows. Deﬁne for each point x ∈ Rⁿ and each time 0 ≤ t ≤ T, α(x, t) = a ∈ A

to be a parameter value where the maximum in (HJB) is attained. In other words, we select α(x, t) so that

v_t(x, t) + f (x,α(x, t)) . ∇x_v(x, t) + r(x,α(x, t)) = 0.

Next we solve the following ODE, assuming α(., t) is suﬃciently regular to let us do so:

Finally, deﬁne the feedback control

(1.7) α^∗ (s) := α(x^∗ (s), s).

In summary, we design the optimal control this way: If the state of system is x at time t, use the control which at time t takes on the parameter value a ∈ A such that the minimum in (HJB) is obtained.

We demonstrate next that this construction does indeed provide us with an optimal control.

THEOREM1.2 (VERIFICATION OF OPTIMALITY). The control α^∗(.) deﬁned by the construction (1.7) is optimal.

Proof. We have

Furthermore according to the deﬁnition (1.7) of α(.):

That is,

and so α^∗(.) is optimal, as asserted.

References

[B-CD] M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhauser, 1997.

[B-J] N. Barron and R. Jensen, The Pontryagin maximum principle from dynamic programming and viscosity solutions to ﬁrst-order partial diﬀerential equations, Transactions AMS 298 (1986), 635–641.

[C1] F. Clarke, Optimization and Nonsmooth Analysis, Wiley-Interscience, 1983.

[C2] F. Clarke, Methods of Dynamic and Nonsmooth Optimization, CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, 1989.

[Cr] B. D. Craven, Control and Optimization, Chapman & Hall, 1995.

[E] L. C. Evans, An Introduction to Stochastic Diﬀerential Equations, lecture notes avail-able at http://math.berkeley.edu/˜ evans/SDE.course.pdf.

[F-R] W. Fleming and R. Rishel, Deterministic and Stochastic Optimal Control, Springer, 1975.

[F-S] W. Fleming and M. Soner, Controlled Markov Processes and Viscosity Solutions, Springer, 1993.

[H] L. Hocking, Optimal Control: An Introduction to the Theory with Applications, OxfordUniversity Press, 1991.

[I] R. Isaacs, Diﬀerential Games: A mathematical theory with applications to warfare and pursuit, control and optimization, Wiley, 1965 (reprinted by Dover in 1999).

[K] G. Knowles, An Introduction to Applied Optimal Control, Academic Press, 1981.

[Kr] N. V. Krylov, Controlled Diﬀusion Processes, Springer, 1980.

[L-M] E. B. Lee and L. Markus, Foundations of Optimal Control Theory, Wiley, 1967.

[L] J. Lewin, Diﬀerential Games: Theory and methods for solving game problems with singular surfaces, Springer, 1994.

[M-S] J. Macki and A. Strauss, Introduction to Optimal Control Theory, Springer, 1982.

[O] B. K. Oksendal, Stochastic Diﬀerential Equations: An Introduction with Applications, 4th ed., Springer, 1995.

[O-W] G. Oster and E. O. Wilson, Caste and Ecology in Social Insects, Princeton UniversityPress.

[P-B-G-M] L. S. Pontryagin, V. G. Boltyanski, R. S. Gamkrelidze and E. F. Mishchenko, The Mathematical Theory of Optimal Processes, Interscience, 1962.

[T] William J. Terrell, Some fundamental control theory I: Controllability, observability, and duality, American Math Monthly 106 (1999), 705–719.

الجَــبْــر Algebra

الجبر أحد الفروع الرئيسية في الرياضيات، حيث إن التمكن من الرياضيات يعتمد على الفهم السليم للجبر. ويستخدم المهندسون والعلماء الجبر يومياً، وتعول المشاريع التجارية والصناعية على الجبر لحل الكثير من المعضلات التي تتعرض لها. ونظراً لأهمية الجبر في الحياة العصرية فإنه يدرّس في المدارس والجامعات في جميع أنحاء العالم. ويُعجب الكثير من الدارسين للجبر بقدرته وفائدته الكبيرتين، إذ باستخدام الجبر يمكن للمرء أن يحل كثيرًا من المسائل التي يتعذر حلها باستخدام الحساب فقط.وجاء اسمه من كتاب عالم الرياضيات والفلك والرحالة محمد بن موسى الخورازمي.

علم المثلثات : Trigonometry

يعتبر علم المثلثات Trigonometry علماً عربياً ، فرياضيو العرب فضلوا علم المثلثات عن علم الفلك كأنهما علمين متداخلين ، ونظموه تنظيماً فيه لكثير من الدقة ، وقد كان اليونان يستعملون وتر CORDE ضعف القوسي قياس الزوايا ، فاستعاض رياضيو العرب عن الوتر بالجيب SINUS فأنت هذه الاستعاضة إلى تسهيل كثير من الاعمال الرياضية.

المعادلات التفاضلية والتكاملية Differential equations and equations integrative

تعتبر المعادلات التفاضلية خير وسيلة لوصف معظم المـسائل الهندسـية والرياضـية والعلمية على حد سواء، إذ يتضح ذلك جليا في وصف عمليات انتقال الحرارة، جريان الموائـع، الحركة الموجية، الدوائر الإلكترونية فضلاً عن استخدامها في مسائل الهياكل الإنشائية والوصف الرياضي للتفاعلات الكيميائية.
ففي في الرياضيات, يطلق اسم المعادلات التفاضلية على المعادلات التي تحوي مشتقات و تفاضلات لبعض الدوال الرياضية و تظهر فيها بشكل متغيرات المعادلة . و يكون الهدف من حل هذه المعادلات هو إيجاد هذه الدوال الرياضية التي تحقق مشتقات هذه المعادلات.

بحث بواسطة :	نوع البحث :
بحث في الفهارس	جميع الكلمات
بحث في اسماء الكتب	بحث مطابق
بحث في اسماء المؤلفين