Update README.md
Browse files
README.md
CHANGED
|
@@ -37,9 +37,9 @@ software. Both of them are included in a single file, which can be
|
|
| 37 |
downloaded and run as follows:
|
| 38 |
|
| 39 |
```
|
| 40 |
-
wget https://huggingface.co/Mozilla/gemma-2-
|
| 41 |
-
chmod +x gemma-2-
|
| 42 |
-
./gemma-2-
|
| 43 |
```
|
| 44 |
|
| 45 |
The default mode of operation for these llamafiles is our new command
|
|
@@ -63,13 +63,13 @@ To instruct Gemma to do role playing, you can customize the system
|
|
| 63 |
prompt as follows:
|
| 64 |
|
| 65 |
```
|
| 66 |
-
./gemma-2-
|
| 67 |
```
|
| 68 |
|
| 69 |
To view the man page, run:
|
| 70 |
|
| 71 |
```
|
| 72 |
-
./gemma-2-
|
| 73 |
```
|
| 74 |
|
| 75 |
To send a request to the OpenAI API compatible llamafile server, try:
|
|
@@ -78,7 +78,7 @@ To send a request to the OpenAI API compatible llamafile server, try:
|
|
| 78 |
curl http://localhost:8080/v1/chat/completions \
|
| 79 |
-H "Content-Type: application/json" \
|
| 80 |
-d '{
|
| 81 |
-
"model": "gemma-
|
| 82 |
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
| 83 |
"temperature": 0.0
|
| 84 |
}'
|
|
@@ -87,7 +87,7 @@ curl http://localhost:8080/v1/chat/completions \
|
|
| 87 |
If you don't want the chatbot and you only want to run the server:
|
| 88 |
|
| 89 |
```
|
| 90 |
-
./gemma-2-
|
| 91 |
```
|
| 92 |
|
| 93 |
An advanced CLI mode is provided that's useful for shell scripting. You
|
|
@@ -95,7 +95,7 @@ can use it by passing the `--cli` flag. For additional help on how it
|
|
| 95 |
may be used, pass the `--help` flag.
|
| 96 |
|
| 97 |
```
|
| 98 |
-
./gemma-2-
|
| 99 |
```
|
| 100 |
|
| 101 |
You then need to fill out the prompt / history template (see below).
|
|
@@ -126,7 +126,7 @@ instead downloading the official llamafile release binary from
|
|
| 126 |
have the .exe file extension, and then saying:
|
| 127 |
|
| 128 |
```
|
| 129 |
-
.\llamafile-0.8.15.exe -m gemma-2-
|
| 130 |
```
|
| 131 |
|
| 132 |
That will overcome the Windows 4GB file size limit, allowing you to
|
|
@@ -172,13 +172,19 @@ AMD64.
|
|
| 172 |
## About Quantization Formats
|
| 173 |
|
| 174 |
This model works well with any quantization format. Q6\_K is the best
|
| 175 |
-
choice overall here.
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
|
| 183 |
## See Also
|
| 184 |
|
|
|
|
| 37 |
downloaded and run as follows:
|
| 38 |
|
| 39 |
```
|
| 40 |
+
wget https://huggingface.co/Mozilla/gemma-2-27b-it-llamafile/resolve/main/gemma-2-27b-it.Q6_K.llamafile
|
| 41 |
+
chmod +x gemma-2-27b-it.Q6_K.llamafile
|
| 42 |
+
./gemma-2-27b-it.Q6_K.llamafile
|
| 43 |
```
|
| 44 |
|
| 45 |
The default mode of operation for these llamafiles is our new command
|
|
|
|
| 63 |
prompt as follows:
|
| 64 |
|
| 65 |
```
|
| 66 |
+
./gemma-2-27b-it.Q6_K.llamafile --chat -p "you are mosaic's godzilla"
|
| 67 |
```
|
| 68 |
|
| 69 |
To view the man page, run:
|
| 70 |
|
| 71 |
```
|
| 72 |
+
./gemma-2-27b-it.Q6_K.llamafile --help
|
| 73 |
```
|
| 74 |
|
| 75 |
To send a request to the OpenAI API compatible llamafile server, try:
|
|
|
|
| 78 |
curl http://localhost:8080/v1/chat/completions \
|
| 79 |
-H "Content-Type: application/json" \
|
| 80 |
-d '{
|
| 81 |
+
"model": "gemma-27b-it",
|
| 82 |
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
| 83 |
"temperature": 0.0
|
| 84 |
}'
|
|
|
|
| 87 |
If you don't want the chatbot and you only want to run the server:
|
| 88 |
|
| 89 |
```
|
| 90 |
+
./gemma-2-27b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
|
| 91 |
```
|
| 92 |
|
| 93 |
An advanced CLI mode is provided that's useful for shell scripting. You
|
|
|
|
| 95 |
may be used, pass the `--help` flag.
|
| 96 |
|
| 97 |
```
|
| 98 |
+
./gemma-2-27b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
|
| 99 |
```
|
| 100 |
|
| 101 |
You then need to fill out the prompt / history template (see below).
|
|
|
|
| 126 |
have the .exe file extension, and then saying:
|
| 127 |
|
| 128 |
```
|
| 129 |
+
.\llamafile-0.8.15.exe -m gemma-2-27b-it.Q6_K.llamafile
|
| 130 |
```
|
| 131 |
|
| 132 |
That will overcome the Windows 4GB file size limit, allowing you to
|
|
|
|
| 172 |
## About Quantization Formats
|
| 173 |
|
| 174 |
This model works well with any quantization format. Q6\_K is the best
|
| 175 |
+
choice overall here.
|
| 176 |
+
|
| 177 |
+
## Testing
|
| 178 |
+
|
| 179 |
+
We tested that the gemma2 27b q6\_k llamafile produces nearly identical
|
| 180 |
+
responses to the Gemma2 model hosted by Google on aistudio.google.com
|
| 181 |
+
when temperature is set to zero.
|
| 182 |
+
|
| 183 |
+

|
| 184 |
+
|
| 185 |
+
Therefore, it is our belief, that the llamafile software faithfully
|
| 186 |
+
implements the gemma model. If you should encounter any divergences,
|
| 187 |
+
then try using the BF16 weights, which have the original fidelity.
|
| 188 |
|
| 189 |
## See Also
|
| 190 |
|