Dataset for the leaderboard results of different LLMs

Hi!

Are the results of different LLMs displayed on the leaderboard based on the dataset under the `evaluator/dataset` directory?
There appear to be only few records on the public dataset. If so, I would like to ask how such a small volume makes the results shown in the leaderboard convincing?

Could you provide more details about the dataset?