Local Search
local search is a heuristic method for solving computationally hard optimization problems. Local search can be used on problems that can be formulated as finding a solution maximizing a criterion among a number of candidate solutions. Local search algorithms move from solution to solution in the space of candidate solutions (the search space) by applying local changes, until a solution deemed optimal is found or a time bound is elapsed.
As you can see, in local search, the path to the goal is not important to us and we're only trying to find the state of the goal. In many problems our purpose is to find that goal state rather than the path that takes us there.
In partial state formulation, each path represent a solution to the problem (as we have seen the systematic exploration of search graph) but in complete state formulation each state represent a solution to the problem.
As you have seen previously, In Systematic exploration of search space, the memory was exponential. But here it's reduced to O(1) instead and the computational time complexity reduced from exponential to O(T) where we choose such T to limit the number of iterations.
In constraint satisfaction problems, we look for states that satisfy some constraints (e.g. in-queens problem, where the constraints are: no two queens can attack each other). In the other hand, in constraint optimization, beside satisfying some constraints, we are looking for optimizing an objective function (whether minimizing or maximizing) (e.g. TSP where the objective function is to minimize the total weight of the edges)
We can convert a constraint satisfaction problem to a constraint optimization problem. for example consider the n-queens problem. we can set our objective function
h = # pairs of queens that attack each other
or
h = # constraints that are violated
and then we can solve the optimization version of the problem.
We also can convert a constraint optimization problem to some constraint satisfaction problems (Can do something like a binary search or we can do it in linear time. e.g. for minimizing function f we set a constraint f=a (for some reasonable a) and at each step we solve the constraint satisfaction version of the problem and decrease a by one)
Both algorithms are asymptotically complete (If the state space is finite, each state is visited at a fixed rate asymptotically)
a better solution is to use a local search algorithm that continuously moves in the direction of increasing elevation/value to find the peak of the mountain or the best solution to the problem. Hill-climbing algorithm terminates when it reaches a peak value where no neighbor has a higher value.
Note that, in this algorithm nodes only contain the state and the value of the objective function in that state (not path) so may see previous states.
It is also called greedy local search as it only looks to its good immediate neighbor state and not beyond that. This may seem like a good thing, but it's not: hill climbing can get stuck in a local optimum easily so its convergence depends on the initial state.
States: 8 queens on the board, one per column (88 ≈ 17 𝑚𝑖𝑙𝑙𝑖𝑜𝑛)
Successors(s): all states resulted from 𝑠 by moving a single queen to another square of the same column (8 × 7 = 56)
Cost function ℎ(s): number of queen pairs that are attacking each other, directly or indirectly.
Global minimum: ℎ(s) = 0
in the above example, the hill-climbing algorithm converges to h = 1 and it can't do any action to improve h, so it stuck at the local minimum.
So the solutions to improve the Hill-climbing algorithm take into account recurring states and local optimal.
For convex or concave functions, like the examples below, the hill-climbing algorithm gets the optimal solution, because the local optimal is the same as global optimal.
In some problems, like n-queens, converging to local optimal isn't acceptable and should find some better solutions.
A solution that may help is to use sideway moves: If no downhill (uphill) moves, allow sideways moves in hope that the algorithm can escape shoulders. There might be a limit to the number of sideway moves allowed to avoid infinite loops.
For the 8-queens problem, if set 100 for the sideway moves limit, the percentage of problems solved raises from 14 to 94%, and 21 steps are needed for every successful solution, 64 for each failure. So using sideway moves the probability of convergence to optimal solution increases but the converge time increases too.
When sideway moves are allowed, performance improves ...
Another solution that helps hill-climbing to be complete, is Stochastic Variations. When the state-space landscape has local optima, any search that moves only in the greedy direction cannot be complete.
the idea of stochastic variations is to combine random-walk and greedy hill-climbing.
At each step do one of the following:
All previous versions are incomplete because of getting stuck on local optima, but the random-restart hill-climbing gives a complete algorithm.
In random-restart hill-climbing, start with the initial random state, and if terminates with the failure, choose another initial random state, and so on...
If p be the probability of success in each hill-climbing search, then the expected number of restarts will be 1/p.
When multiple restarts are allowed, performance improves...
If we want to increase the randomness we can combine ideas of greedy local search, random walk, and random restart. At each step do one of the three with the same probability:
Tabu Search works like hill-Climbing, but it maintains a tabu list of constant size, like k, to avoid getting stuck in local optima. The tabu list holds k recent used objects that are taboo to use for now. Moves that involve an object in the tabu list, are not accepted. Tabu search raises the space complexity from O(1) to O(k) but in many problems that use sideway moves, it improves the performance of hill-climbing.
Simulating annealing (SA) is one the search algorithms.The idea which is used for SA is close to random walking in other search algorithms. SA uses physical concepts for escaping local optimas by allowing some bad moves and gradually decreasing their size and frequency because if these bad moves go on and the answer be somewhere near global optima, it will get far from global optima.This method proposed in 1983 by IBM researchers for solving VLSI layout problems.
Let's have a look at a physical analogy:
The picture below demonstraits the places where ball probably will be and the next state is getting closer to goal state which is global optima:

Simulated annealing refers to the process of cooling a liquid until it form crystalline shape. This physiacal process must be done slowly to form better crystalline shapes.At first molecules of liquid have too much kinetic energy and are moving so fast with brownian motions, by cooling it slowly seems that the energy is getting less and less. In this process due to the fact that we are reducing the temperature gradually, number of bad moves or moves with too much energy will decrease until it converges to global optima.
Based on this intuition:
The following pseudocode presents the simulated annealing heuristic as described above:
This picture illustrates 2 points:
In this exmaple, SA is searching for a maximum. By cooling the temperature slowly, the global maximum is found.

How to define this schedulability?
There is only one theorem about this.
Theorem: If T is decreased sufficiently slow, global optima will be found approximately with probability of 1.
Is this rate same for all problems?
No, it depends on the problem.
Now, Is this theorem a useful guarantee?
Convergence can be guaranteed if at each step, T drops no more quickly than $\frac{C}{log n}$, where C is a constant and problem dependent and n is the number of steps so far.In practice different Cs are used to find the best choice for problem.
Keeping only one node in memory is an extreme reaction to memory problems.
Local beam search is another algorithm which keeps track of k states instead of keeping only one node in memory.
Now, How?
This is an example of Local Beam Search when k = 3:
$\star$ Note that this algorithm is not the same as k random-start searches run in parallel. In Beam search, searches that find good states recruit other searches to join them.
Quite often, all k states end up on same local hill.
So, What should be done?
The idea is to use Stochastic beam search which chooses k successors randomly,biased towards good ones.
This algorithm has many uses in:
A State (solution) is represented as a string over a finite alphabet (e.g. a chromosome containing genes)
Start with k randomly generated states (Population)
Evaluation function to evaluate states (Higher values for better states) (Fitness function)
Combining two parent states and getting offsprings (Cross-over)
Reproduced states can be slightly modified with a probability of P (mutation)
The next generation of states is produces by selection (based on fitness function), crossover and mutation
Keeping all these in mind, we will go into more detail in the below example
Describe the state as a string
Cross-over : To select some part of the state from one parent and the rest from another
Mutation : To change a small part of one state with a small probability
And in the picture below you can see the way we calculated the fitness function
Positive points
Random exploration can find solutions that local search can't (via crossover primarily)
Appealing connection to human evaluation ("neural" networks, and "genetic" algorithms are metaphors!)
Negative points
All the introduced methods are related to the search problems which are computationally hard. These algorithms move from solution to solution in the space of candidate solutions (the search space) by applying local changes, until a solution deemed optimal is found or a time bound is elapsed.
As mentioned before, in the exploration of search space, the space complexity was exponential; But using these local algorithms, it is reduced to O(1) instead, and the computational time complexity, which was exponential, is reduced to O(T) where T is the number of iterations.