Examples of recipes for training NNUE in Shogi

[WIP] increase the number of recipes

nodchip's pre-trainied model

Most existing NNUE evaluation function in Shogi uses nodchip's pre-trained model as starting point. Recipe of nodchip's pretrainied-model is here.

Step.1 Generation of 5 billion training data with depth 8 using Apery

Apery's generation algorithm is most similar to AlphaZero. Starting position of self-play is generated by existing game record (human professionals) + random moves.

Step.2 Learning

Following commands are used.

learn targetdir path_to_training_data loop 100 batchsize 1000000 lambda 1.0 eta 1.0 newbob_decay 0.5 eval_save_interval 500000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name path_to_validation_data nn_batch_size 1000 eval_limit 32000

learn targetdir path_to_training_data loop 100 batchsize 1000000 lambda 0.5 eta 0.1 newbob_decay 0.5 eval_save_interval 100000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name path_to_validation_data nn_batch_size 1000 eval_limit 32000

Qhapaq's fine-tuning

[WIP] FEN/bin converter is not implemented

Qhapaq's fine-tuning is an easy way to train preferable strategy(especially opening strategy). It achieved several SOTAs (the most famous one is Apery-Qhapaq function which is cited by AlphaZero paper. It is a fine-tuned evaluation function of Apery).

Qhapaq's fine-tuning uses game records of rating calculation. The number of position is approximately 5 million. To avoid over-fitting learning rate and the number of iteration should be small. It is difficult to predict strength of trained evaluation function by loss curve. One need to check elo rating carefully. Here is an example command.

learn path_to_training_data newbob_decay 0.5 validation_set_file_name path_to_validation_fle nn_batch_size 50000 batchsize 1000000 eval_save_interval 8000000 eta 0.05 lambda 0.33 eval_limit 3000newbob_num_trials 20 mirror_percentage 0

Appendix: log of training in Qhapaq's fine-tuning

Tue Oct 16 23:50:01 2018, 39000007 sfens, iteration 39, eta = 0.05, hirate eval = 53 , test_cross_entropy_eval = 0.397738 , test_cross_entropy_win = 0.351142 , test_entropy_eval = 0.347285 , test_entropy_win = -6.48273e-07 , test_cross_entropy = 0.366519 , test_entropy = 0.201711 , norm = 4.7819e+07 , move accuracy = 38.715% , learn_cross_entropy_eval = 0.405593 , learn_cross_entropy_win = 0.373318 , learn_entropy_eval = 0.358318 , learn_entropy_win = -6.68275e-07 , learn_cross_entropy = 0.383969 , learn_entropy = 0.214361

  • Move accuracy in NNUE-shogi to training data will be around 38%.
  • Reference record: Accuracy of policy network to training data in LeelaChessZero is around 47%. On the other hand, accuracy of policy network in DeepLearningShogi is approximately around 45%

Tayayan(2020's champion)'s research

Original document

  • Improvement of evaluation function to the amount training data is saturated around 1 billion
  • Improvement of accuracy of training data works. For example if the nth position has a lower winning-percentage than the n-2th and n+2th positions, the prediction is probably wrong.
  • Training from nodchip's pre-trained function is important possibly because the balance of neuron (e.g. percentage of neuron that contributes early position) is great.

Options of learning commands (gensfen)

[WIP] list all of options and check they are implemented in stockfish-nnue

gensfen is a command to generate training data. Official document is here.

option name effect Implemented in stockfish
depth Depth of search Yes
loop Number of training positions Yes
output_file_name Name of generated training data Yes
eval_limit Threshold to stop self-play Unknown
write_minply Min threshold of ply of position to use. Useful to avoid duplication of training data Unknown
write_maxply Max threshold of ply of position to use. Useful to avoid too long draw-game Unknown
random_move_minply / random_move_maxply / random_move_count Do random random_move_count moves during random_move_minply and random_move_maxply. Playouts before random move will be discarded because game's result is not reasonable. Unknown

options of learning

shuffle

learn shuffle basedir BASE_DIR targetdir TARGET_DIR output_file_name OUTPUT_FILE_NAME [teacher game file name 1] [teacher game file name 2]

You can specify buffer_size after it: buffer_size BUFFER_SIZE When shuffling, the temporary file is written once in the tmp/ folder for each buffer_size phase. For example, if buffer_size = 20000000 (20M), you need a buffer of 20M*40bytes = 800MB. For a PC with a small amount of memory, it would be better to reduce this. However, if the number of files increases too much, it will not be possible to open at the same time due to OS restrictions. There should be a limit of 512 per process on Windows, so you can open here as 500, The current setting is 500 files x 20M = 10G = 10 billion phases.

FYI misc info

How many training data is required?

qhapaq_49 : @nodchip Thank you for your comment. Do you have any recommendation about how many training data is required for initial training? In your research, 5 billion positions are used. But another researches suggests 1 billion is enough. I think trying many options (e.g. evaluation function for training-data generation, initial positions for self-playout) is more important than increasing the number of positions.

nodchip : I recommend to use more than 10 * (the number of the NN parameters) for each iteration including initial training. In the case of halfkp_256x2-32-32, there are more than 10m parameters. I recommend to use more than 150m fens for each iteration. But I don't know if this is the best number...

nodchip : Ah... I remember a method to find the optimal number of fens for training. It is to check the loss value. We could prepare 100m fens, train with them, and check the final loss value (test_cross_entropy). Then we add more 100m, train, and check the final loss value, again and again. If the final loss value is not decreased, it represents that there are nothing to learn from the training data. Then, it is the optimal number of fens for initial training. `