An AGI within the "alive" environment.
by Casian STEFAN, Principal Researcher at Essentia Mundi AI Lab. Contact: ai-AT-essentiamundi.com / ai.essentiamundi.com
Nov. 2023.
Please consider citation with link, if you derive work. Or contact me for collaboration. Thank you!
"The logical space is present in every object. For an agent, also the Environment has to have meaning. I like this dualism, so the Environment is to take the position of a "weak" proactive agent. Acting as ontological priors." (post by C. Stefan on LinkedIn).
and the idea expanded here:
"I like duality, and while researches proposes most of the time an agent`s point of view over an "alien" inert environment, I would go a step beyond, and add that the environment should also have basic cognitive characteristics, when for better alignment, to be sharing common properties - as priors to the agent, to the ontological grow. It does not need have intelligence, it just has to play a weak proactive role against the "snowball" agent, slight beyond physics. Well that approach would align itself with the Panpsychism/double-slit/constructivism views that seem to better explain reality."
(20 Nov 23).
Intuitively I got covered after just a few days I posted this. This "weak" agency of the environment can be in the form of the Q-learning, reinforcement learning.
The environment in a Q-learning scenario is typically defined by a set of states, actions, and a reward structure.
States (s): States represent the different situations or configurations that the agent can find itself in. These states can be discrete or continuous. For example, in a game of chess, each board configuration could be a state. In a robotic navigation task, a state could represent the current position and orientation of the robot.
Actions (a): Actions are the decisions that the agent can take in a given state. The set of possible actions depends on the specific environment. In a chess game, actions might include moving a piece to a new position. In a robotic navigation task, actions could be moving forward, turning left or right, etc.
Rewards (r): When the agent takes an action in a certain state, it receives a numerical reward. The reward indicates the immediate benefit or cost of taking that action in that state. The goal of the agent is to learn a policy that maximizes the cumulative reward over time. Rewards can be positive, negative, or zero.
Transition Dynamics: The environment responds to the agent's actions by transitioning to a new state. The transition dynamics describe the probability of moving from one state to another after taking a particular action.
The agent interacts with the environment over multiple time steps, choosing actions in different states, receiving rewards, and updating its knowledge (Q-values) accordingly. The learning process involves exploring different state-action pairs, learning from the received rewards, and gradually improving the policy to make better decisions.
Q-Learning:
A model-free reinforcement learning algorithm used to learn a policy, which tells an agent what action to take under what circumstances, in order to maximize a reward.
State Action Reward State Action (SARSA):
Another reinforcement learning algorithm similar to Q-learning but considers the current state, the action taken, the reward received, and the next state.
Deep Q Network (DQN):
Combines Q-learning with deep neural networks to handle complex and high-dimensional state spaces. Particularly effective in environments with image inputs.
Deep Deterministic Policy Gradient (DDPG):
An algorithm for learning continuous actions in environments with continuous state spaces. It combines ideas from DQN and policy gradients.
Trust Region Policy Optimization (TRPO):
A policy optimization algorithm that aims to find policies that perform well while avoiding large policy changes that could lead to catastrophic outcomes.
Proximal Policy Optimization (PPO):
Similar to TRPO but with improvements in terms of ease of implementation and sample efficiency.
Q Search (Deep Learning version of A search):**
An extension of the A* search algorithm that incorporates deep learning to enhance its decision-making capabilities.
Q-Transformers:
An offline training method inspired by Q-learning, representing an improvement over Decision Transformers.
XoT - Chain-of-Thought Method:
An approach that uses search to guide Large Language Model (LLM) responses, indicating an integration of search techniques with language modeling.
_____________________________________
Exploration by C. Stefan, Nov. 2023 [about]
Last update: Nov. 2023 (versions: *) [versions]
"Essentia Mundi" AI Research Lab. [home]
Copyright © 2023 AI.EssentiaMundi.com, all rights reserved.
_
References:
https://en.wikipedia.org/wiki/Q-learning
Best AI Website Maker