Show Idle (>14 d.) Chans


← 2024-12-27 | 2025-01-05 →
08:51 billymg http://logs.bitdash.io/pest/2024-12-27#1034669 << yeah, i think this is true. but you could look into "fine tuning", that's something that's within the realm of a beefy home setup or rented server farm time
08:51 bitbot Logged on 2024-12-27 20:43:17 asciilifeform[jonsykkel|billymg]: afaik the irons req'd to actually train gptistic models de novo, tho, costs $maxint
08:53 billymg i think it's roughly taking an existing model and feeding it with enough domain-specific data that it can be used effectively for a given use case
~ 4 hours 27 minutes ~
13:20 discord_bridge (awtho) billymg: I built llama-cpp. I attempted to run a 17gb deepseek model, but it ended up freezing my macbook. I tried another deepseek model using LM Studio (which should be accessible via Cline) but it is very very slow.
~ 3 hours 12 minutes ~
16:33 billymg awt: is it an intel macbook pro or arm?
16:33 discord_bridge (awtho) Arm
16:34 billymg how much total ram?
16:35 billymg when you try running the model with llama-server you can open 'Activity Monitor' and see how much memory you have available
16:35 discord_bridge (awtho) 16 gb
16:35 discord_bridge (awtho) Is there a way I can safely do that without freezing my machine?
16:35 billymg ah, def not enough then for the 17gb model, it must have been swapping and that's what froze it
16:37 billymg considering the OS, plus browser, IDE, whatever other random things are gonna take up at least 50% of your ram i'd say your best bet is trying it out on your desktop PC (assuming that has the specs for it)
16:38 billymg you can run it on a desktop pc and serve on your local network too, so your macbook's VS Code plugin will just be making requests to llama-server on your desktop
16:39 discord_bridge (awtho) Desktop has: 16 GB Radeon RX 6900 XT with 5120, 128 GB ECC ram.
~ 15 minutes ~
16:55 billymg that oughta be enough to get it going. i've only tried it with nvidia but you build it with HIP for AMD GPUs. llama-server then has a flag -ngl, --gpu-layers that lets you control how much to offload to VRAM
16:56 billymg it will exit if you exceed your available vram, so the idea is to increase until it fails
16:57 billymg the rest of the model will then load into your regular ram and the inferencing will happen on the CPU, so it will be slower but usable
17:03 discord_bridge (awtho) ty
← 2024-12-27 | 2025-01-05 →