08:51 |
billymg |
http://logs.bitdash.io/pest/2024-12-27#1034669 << yeah, i think this is true. but you could look into "fine tuning", that's something that's within the realm of a beefy home setup or rented server farm time |
08:51 |
bitbot |
Logged on 2024-12-27 20:43:17 asciilifeform[jonsykkel|billymg]: afaik the irons req'd to actually train gptistic models de novo, tho, costs $maxint |
08:53 |
billymg |
i think it's roughly taking an existing model and feeding it with enough domain-specific data that it can be used effectively for a given use case |
| |
~ 4 hours 27 minutes ~ |
13:20 |
discord_bridge |
(awtho) billymg: I built llama-cpp. I attempted to run a 17gb deepseek model, but it ended up freezing my macbook. I tried another deepseek model using LM Studio (which should be accessible via Cline) but it is very very slow. |
| |
~ 3 hours 12 minutes ~ |
16:33 |
billymg |
awt: is it an intel macbook pro or arm? |
16:33 |
discord_bridge |
(awtho) Arm |
16:34 |
billymg |
how much total ram? |
16:35 |
billymg |
when you try running the model with llama-server you can open 'Activity Monitor' and see how much memory you have available |
16:35 |
discord_bridge |
(awtho) 16 gb |
16:35 |
discord_bridge |
(awtho) Is there a way I can safely do that without freezing my machine? |
16:35 |
billymg |
ah, def not enough then for the 17gb model, it must have been swapping and that's what froze it |
16:37 |
billymg |
considering the OS, plus browser, IDE, whatever other random things are gonna take up at least 50% of your ram i'd say your best bet is trying it out on your desktop PC (assuming that has the specs for it) |
16:38 |
billymg |
you can run it on a desktop pc and serve on your local network too, so your macbook's VS Code plugin will just be making requests to llama-server on your desktop |
16:39 |
discord_bridge |
(awtho) Desktop has: 16 GB Radeon RX 6900 XT with 5120, 128 GB ECC ram. |
| |
~ 15 minutes ~ |
16:55 |
billymg |
that oughta be enough to get it going. i've only tried it with nvidia but you build it with HIP for AMD GPUs. llama-server then has a flag -ngl, --gpu-layers that lets you control how much to offload to VRAM |
16:56 |
billymg |
it will exit if you exceed your available vram, so the idea is to increase until it fails |
16:57 |
billymg |
the rest of the model will then load into your regular ram and the inferencing will happen on the CPU, so it will be slower but usable |
17:03 |
discord_bridge |
(awtho) ty |