Actually, scratch what I was saying. It's a load of bunk.
The key is that
you shouldn't be having the NN play while it's training. Instead, you devise two
other AIs, and have them play each other instead.
One plays perfectly, and should always win or draw - we'll call that AI PP. The other should simply play at random, without any consideration as to whether its moves are worthwhile or not. We'll call that AI R.
As the games progress, take each state of the board at the start of PP's turn, and use that as an input. Pass the state of the board after PP's turn as the expected output. You otherwise don't need to record anything about the games.
After a sufficient number of games have been simulated, you should be able to substitute PP with NN and it should continue to always win or draw. You may wish to occasionally substitute R with another instance of PP during training - it's important that NN learn how to deal with R, but it'll become a better player faster if it watches PP against PP every now and then. NN itself wouldn't make a good substitute for R, because it'll start shifting from R's behaviour to PP's behaviour, and as a result will be less likely to teach itself how to deal with "bad" moves performed by its opponent.
Both PP and R could be imitated by human players, but the number of matches they'd need to play would make that somewhat prohibitive. It's also important that every move NN is taught is a move that PP would make; you don't want to teach it "mistakes". Moves that aren't "perfect play" may win under certain circumstances, but that doesn't mean they should be encouraged.
PP's AI would be coded according to these rules:
- There are three possible opening moves; side, corner, or middle. Everything else is a reflection/rotation of one of those.
- Unless the opponent has already taken a corner before it can, PP should attempt the following:
X| | X| | X| |X X| |X
-+-+- -+-+- -+-+- -+-+-
| | | | | | | |X
-+-+- -+-+- -+-+- -+-+-
| | | |X | |X | |X
If the opponent has already taken a corner, PP should instead attempt the following: | | |X| |X|
-+-+- -+-+- -+-+-
|X| |X| |X|
-+-+- -+-+- -+-+-
| | | | |X|
If PP's chosen pattern is blocked, then it should simply attempt to create lines of three, making use of whatever it's already placed on the board.PP should stop its current tactic and attempt to block the opponent if the opponent would otherwise win on their next turn.I'm fairly sure that's sufficient for perfect play.