Note: This is only a partially written book, which doesn’t seem to have any content since 2008. I read it anyway because:

- It was free
- Focused on the type of optimization I am interested in (direct search)
- Even though it is unfinished, at 800+ pages it is quite long and extensive
- The document itself was quite readable

So anyway, here goes:

——-ch1 [Intro]

(p. 63) The correlation length measures how the autocorrelation function (a simple function) decreases and

summarizes the ruggedness of the landscape

(p. 71) “The effects of noise in optimization have been studied by various researchers; Miller and Goldberg

[1416, 1415], Lee and Wong [1268], and Gurin and Rastrigin [870] are some them. Many global optimization

algorithms and theoretical results have been proposed wich can deal with noise. Some of them are, for

instance, specialized genetic algorithms [685, 2062, 2060, 1799, 1800, 1146],

Evolution Strategies [195, 100, 881], and

Particle Swarm Optimization [1606, 884] approaches.

– [685] J. Michael Fitzpatrick and John J. Grefenstette. Genetic algorithms in noisy environments.

Machine Learning, 3(2–3):101–120, October 1988. ISSN: 0885-6125 (Print)1573-0565 (Online).

doi:10.1007/BF00113893. Online available at http://www.springerlink.com/content/n6w328441t63q374/fulltext.pdf

– [1415] Brad L. Miller and David E. Goldberg. Genetic algorithms, tournament selection, and the e?ects

of noise. IlliGAL Report 95006, Online available at http://citeseer.ist.psu.edu/86198.

html and http://www.illigal.uiuc.edu/pub/papers/IlliGALs/95006.ps.Z

– [1416] Brad L. Miller and David E. Goldberg. Genetic algorithms, selection schemes, and the varying e?

ects of noise. Evolutionary Computation, 4(2):113–131, Summer 1996. ISSN: 1063-6560.

doi:10.1162/evco.1996.4.2.113. See also [1415]. Online available http://citeseer.ist.psu.edu/110556.html and

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=4225507A21F8083F025E28499309D1FB?

doi=10.1.1.31.3449&rep=rep1&type=pdf

(p. 71) More discussion of when tolerance of noise/robustness is necessary, along with discussion of unstable

global optimum (needle) vs stable local optimum

(p. 76) Discussion of non-stationary optimization

(p. 78) No Free Lunch Theorem: it is impossible for any optimization algorithm to outperform random walks or

exhaustive enumerations on all possible problems. For every problem wehere a given method leads to good

results, we can construct a problem wehere the same method has the opposite effect. Doing exactly this is a

common method for comparing/benchmarking optimization algorithms.

(p. 79) Because of this, all useful optimization algorithms utilize some problem specific

knowledge/assumptions. This means that having many optimization methods can be useful because each may have

different assumptions and only one may be most suitable for any particular problem.

(p. 87) Applications of global optimization and conferences/workshops

——-ch2 [Evolutionary Algorithms]

——-ch3 [Genetic Algorithms]

——-ch4 [Genetic Programming]

——-ch5 [Evolution Strategies]

——-ch6 [Evolutionary Programming]

——-ch7 [Learning Classifier Systems]

(p. 233) A special case of production systems

——-ch8 [Ant Colony Optimization]

——-ch9 [Particle Swarm Optimization]

(p. 249) There is a population in the swarm, each individual can communicate with a subset of the swarm in

its neighborhood

(p. 250) Each individual updates its position based on its velocity and then updates velocity based on its

own and neighbor’s information

…Ch is very incomplete

——-ch10 [Hill Climbing]

(p. 253) Hill climbing as a genetic algorithm w/population size 1

(p. 257) Raindrop method

——-ch11 [Random Optimization]

(p. 259) This just feels like a special case of hill climbing where the neighborhood is sampled from

randomly?

——-ch12 [Simulated Annealing]

——-ch13 [Extremal Optimization]

(p. 269) Also based on physics, comes from “metaphor of thermal equilibria.” Based on idea of “self

organized criticality,” that large interactive systems evolve into a state where a change in one element can

lead to drastic changes all over the system. # of elements involved in the change is proportional to n^-t.

(p. 270) There is a modified version of this where individual genes can be modified if each contributes by

itself to fitness. Normally this is hard to do, but can measure the difference of fitness between 2 genomes

when one element is changed.

——-ch14 [Tabu Search]

(p. 273) Basic idea is to store a list of configurations tested and not re-test them. Can also try to store

properties of tested configurations instead of actual configurations, which can lead to algo. neglecting

(potentially good) areas of the search space that were so far untested

(p. 274) It is very similar to hill-climbing or simulated annealing

——-ch15 [Memetic + Hybrid Algorithms]

(p. 277) Basic idea is to combine GAs with some other method (such as simulated annealing) in concert to do

optimization

(p. 280) “Baldwin evolution” can be added by doing a local search to each new individual genome

——-ch16 [Downhill Simplex/Nedler Mead] *read more on this*

(p. 283) Uses only values of objective functions w/o derivative information, so is a direct search method

(p. 283) Downhill simplex optimization uses n+1 points in the Rn. These points form a polytope3, a

generalization of a polygone, in the n-dimensional space Nondegenerated simplexes, i.e., those where the set

of edges adjacent to any vertex form a basis in the Rn, have one important festure: The result of replacing a

vertex with its re?ection through the opposite face is again, a nondegenerated simplex (see Fig. 16.1.a). The

goal of downhill simplex optimization is to replace the best vertex of the simplex with an even better one or

to ascertain that it is a candidate for the global optimum [1276]. Therefore, its other points are constantly

?ipped around in an intelligent manner as we will outline in Section 16.3.

(p. 283) Gets stuck in local optima, so restarts are recommended

——-ch17 [State Space Search]

(p. 289) Output of function being searched is binary, not real. Either configuration is successfull or not.

(p. 289) State space search algos can be applied to optimization by thresholding the output of the function

being optimized

(p. 289) All of the search functions described are deterministic

(p. misc) Talk about BFS/DFS/Iterative Deepening, random walks, A*

——-ch18 [Parraelization and Distribution]

——-ch19 [Maintaining the Optimal Set]

(p. 309) This is only for multi-objective optimization. Otherwise this is trivial

(p. 310) Methods of gridding and clustering to keep track of optimal elements

——-ch21 Benchmarks

Rest of book is applications, background, implementation details

Update, recently heard about pattern search.