The recommended settings according to the Gemma team are:
temperature = 0.95
top_p = 0.95
top_k = 64
Also beware of double BOS tokens! You can run my uploaded GGUFs with the recommended chat template and settings via ollama run hf.co/unsloth/gemma-3-27b-it-GGUF:Q4_K_M
Daniel, as always, thanks for these. I had good results with your Q4_K_M quant on mac / llama.cpp. However, on Linux/A100/ollama, there is something very wrong with your Q8_0 quant. python code has indentation errors, missing close parens, quite a lot that's bad. I ran both with your suggested command lines, but of course could have been some mistake I made. I'm testing the bf16 on the A100 now to make sure it's not a hardware issue, but my gut is there's a model or ollama sampling problem here.
Thanks for this, but I'm still unable to reproduce the results from Google AI studio.
I tried your version and when I ask it to create a tetris game in python, the resulting file has syntax errors. I see strange things like a space in the middle of a variable name/reference or weird spacing in the code output.
The recommended settings according to the Gemma team are:
temperature = 0.95
top_p = 0.95
top_k = 64
Also beware of double BOS tokens! You can run my uploaded GGUFs with the recommended chat template and settings via ollama run hf.co/unsloth/gemma-3-27b-it-GGUF:Q4_K_M