[WIP] increase the number of recipes
Most existing NNUE evaluation function in Shogi uses nodchip's pre-trained model as starting point. Recipe of nodchip's pretrainied-model is here.
Apery's generation algorithm is most similar to AlphaZero. Starting position of self-play is generated by existing game record (human professionals) + random moves.
Following commands are used.
learn targetdir path_to_training_data loop 100 batchsize 1000000
lambda 1.0 eta 1.0 newbob_decay 0.5 eval_save_interval 500000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name path_to_validation_data nn_batch_size 1000 eval_limit 32000
learn targetdir path_to_training_data loop 100 batchsize 1000000
lambda 0.5 eta 0.1 newbob_decay 0.5 eval_save_interval 100000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name path_to_validation_data nn_batch_size 1000 eval_limit 32000
[WIP] FEN/bin converter is not implemented
Qhapaq's fine-tuning is an easy way to train preferable strategy(especially opening strategy). It achieved several SOTAs (the most famous one is Apery-Qhapaq function which is cited by AlphaZero paper. It is a fine-tuned evaluation function of Apery).
Qhapaq's fine-tuning uses game records of rating calculation. The number of position is approximately 5 million. To avoid over-fitting learning rate and the number of iteration should be small. It is difficult to predict strength of trained evaluation function by loss curve. One need to check elo rating carefully. Here is an example command.
learn path_to_training_data newbob_decay 0.5 validation_set_file_name path_to_validation_fle nn_batch_size 50000 batchsize 1000000 eval_save_interval 8000000 eta 0.05 lambda 0.33 eval_limit 3000newbob_num_trials 20 mirror_percentage 0
Tue Oct 16 23:50:01 2018, 39000007 sfens, iteration 39, eta = 0.05, hirate eval = 53 , test_cross_entropy_eval = 0.397738 , test_cross_entropy_win = 0.351142 , test_entropy_eval = 0.347285 , test_entropy_win = -6.48273e-07 , test_cross_entropy = 0.366519 , test_entropy = 0.201711 , norm = 4.7819e+07 , move accuracy = 38.715% , learn_cross_entropy_eval = 0.405593 , learn_cross_entropy_win = 0.373318 , learn_entropy_eval = 0.358318 , learn_entropy_win = -6.68275e-07 , learn_cross_entropy = 0.383969 , learn_entropy = 0.214361
[WIP] list all of options and check they are implemented in stockfish-nnue
gensfen
is a command to generate training data. Official document is here.
option name | effect | Implemented in stockfish |
---|---|---|
depth | Depth of search | Yes |
loop | Number of training positions | Yes |
output_file_name | Name of generated training data | Yes |
eval_limit | Threshold to stop self-play | Unknown |
write_minply | Min threshold of ply of position to use. Useful to avoid duplication of training data | Unknown |
write_maxply | Max threshold of ply of position to use. Useful to avoid too long draw-game | Unknown |
random_move_minply / random_move_maxply / random_move_count | Do random random_move_count moves during random_move_minply and random_move_maxply . Playouts before random move will be discarded because game's result is not reasonable. |
Unknown |
learn shuffle basedir BASE_DIR targetdir TARGET_DIR output_file_name OUTPUT_FILE_NAME [teacher game file name 1] [teacher game file name 2]
You can specify buffer_size after it: buffer_size BUFFER_SIZE When shuffling, the temporary file is written once in the tmp/ folder for each buffer_size phase. For example, if buffer_size = 20000000 (20M), you need a buffer of 20M*40bytes = 800MB. For a PC with a small amount of memory, it would be better to reduce this. However, if the number of files increases too much, it will not be possible to open at the same time due to OS restrictions. There should be a limit of 512 per process on Windows, so you can open here as 500, The current setting is 500 files x 20M = 10G = 10 billion phases.
qhapaq_49 : @nodchip Thank you for your comment. Do you have any recommendation about how many training data is required for initial training? In your research, 5 billion positions are used. But another researches suggests 1 billion is enough. I think trying many options (e.g. evaluation function for training-data generation, initial positions for self-playout) is more important than increasing the number of positions.
nodchip : I recommend to use more than 10 * (the number of the NN parameters) for each iteration including initial training. In the case of halfkp_256x2-32-32, there are more than 10m parameters. I recommend to use more than 150m fens for each iteration. But I don't know if this is the best number...
nodchip : Ah... I remember a method to find the optimal number of fens for training. It is to check the loss value. We could prepare 100m fens, train with them, and check the final loss value (test_cross_entropy). Then we add more 100m, train, and check the final loss value, again and again. If the final loss value is not decreased, it represents that there are nothing to learn from the training data. Then, it is the optimal number of fens for initial training. `