Today, I will continue my posts on a chess-playing FPGA, which we introduced in a previous blog. We've just finished considering the Move Generator and how it can select a "preferred" move from a given position. We've also discussed how a Board Position Evaluation function can be used to make our search of the tree of all possible moves practical, so we can identify our best move. So now, let's figure out how to implement the evaluation function using the Move Generator cellular array.
The Move Generator cellular array
The Move generator, as we have defined it so far, is organized as a cellular array of very similar combinatorial logic functions with a logic "cell" corresponding to each square on the chess board (thereby making the Move Generator an 8x8 array of logic functions).
Each chessboard square contains a piece (even if that piece is a blank) that we can use to help construct the evaluation function that reports the "value" of the board. We have been considering a simple evaluation function that just sums the values of each piece on the board as a way to measure how close one side (or the other) is to winning.
If you have more "material" than your opponent, then -- hopefully -- you have a better chance of winning. (In my previous blog, we discussed the "horizon problem" associated with this approach. It might be that you have more material, but could be checkmated in the next move. This problem can't be solved with the standard tree search algorithm we are using, but hopefully these results will be avoided as moves are made and the horizon is extended past the problem position.)
So, how can we use the cellular array to compute the value of the pieces on the board? The function value we want to return is the sum of the values of all our pieces minus the sum of the values of the opponent's pieces (if this is a positive number, the board is good for us; if negative, it's bad for us).
It might make sense to use adders to compute this value. We could place an adder at each square, and combine the "accumulated value" of the squares this far with the value of the piece in the current square. The output of the adder would propagate to the next square and continue to the end of the board. A diagram reflecting this design is illustrated below (click here to see a larger, more detailed version of this image):
A four-square section of the array is expanded on the left of the diagram and shows the adder inside each cell, and the "SumIn" and "SumOut" busses that aggregate the board's piece value. Each cell is connected to its "upstream" and "downstream" cell counterparts, and the connections "snake" around the entire board to cover all the squares (the turns at the edges of the board are not shown for simplicity).
This approach would require 64 adders -- one per cell. The delay would thus be 64 times the adder delay. This might be fairly long, and could end up slowing down the computation. Of course, we can use a different approach -- opposed to creating our "aggregator" in a linear fashion, we could instead use a tree structure. The figure below illustrates this approach (click here to see a larger, more detailed version of this image):
A four-square section of the array is expanded in the left side of the diagram. Inside this group of four cells are three adders to create the sum of the four pieces (perhaps a knight, a pawn, and two blank squares as illustrated here).
So, we need three adders for each group of four cells. The outputs from every pair of these groups then need to be summed (as shown in the right side of the diagram). In fact, every pair of groups is summed and summed again until we reach the final adder -- which is conceptually located in the very center of the board -- that computes the final result.
How many adders does this tree structure approach require? Well, counting all the adders in the diagram, we find it takes (16 x 3) + 8 + 4 + 2 + 1, or 63. Although this is only one less adder than our original solution, the number of delays is much reduced to only six (can you trace the signal flow thru the diagram to see this?), as opposed to the 64 levels required by the linear approach. Thus, from a logic standpoint, it would seem the tree approach is to be preferred.
On the other hand, this new design does consume more intermediate routing resources than the "linear" approach (where we had a simple left-to-right, right-to-left signal flow), but even so, we would expect the tree structure to offer us a significantly faster result.
So, what do you think? Is this the best approach to compute the overall piece value of the board position? Maybe there are "tricks" we can use to speed up the process even more? Here's a clue -- think about how the board position changes from move to move.
Feel free to leave your ideas in the comment section below, and we will revisit the evaluation function in next week's blog, along with some additional ideas.
Related posts: