Class GPULlama3StreamingChatModel
java.lang.Object
dev.langchain4j.model.gpullama3.GPULlama3StreamingChatModel
- All Implemented Interfaces:
StreamingChatModel, AutoCloseable
GPULlama3 implementation of the langchain4j StreamingChatModel interface.
This model provides synchronous chat capabilities using the GPULlama3.java library, supporting both CPU and GPU execution modes. The model automatically separates thinking content from actual responses.
Example usage:
GPULlama3StreamingChatModel model = GPULlama3StreamingChatModel.builder()
.modelPath(Paths.get("path/to/model.gguf"))
.temperature(0.7)
.maxTokens(2048)
.onGPU(true)
.build();
ChatResponse response = model.chat(chatRequest);
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionbuilder()voidclose()Closes the model and releases all associated resources.voiddoChat(ChatRequest chatRequest, StreamingChatResponseHandler handler) voidManually releases GPU resources allocated by TornadoVM.org.beehive.gpullama3.model.ModelgetModel()org.beehive.gpullama3.inference.sampler.Samplervoidinit(Path modelPath, Double temperature, Double topP, Integer seed, Integer maxTokens, Boolean onGPU) modelResponse(ChatRequest request, IntConsumer tokenConsumer) Generates a chat response from the model.voidMethods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface StreamingChatModel
chat, chat, chat, defaultRequestParameters, listeners, provider, supportedCapabilities
-
Method Details
-
builder
-
doChat
- Specified by:
doChatin interfaceStreamingChatModel
-
init
-
getModel
public org.beehive.gpullama3.model.Model getModel() -
getSampler
public org.beehive.gpullama3.inference.sampler.Sampler getSampler() -
modelResponse
Generates a chat response from the model. Used by GPULlama3StreamingChatModel.- Parameters:
request-tokenConsumer-- Returns:
-
printLastMetrics
public void printLastMetrics() -
freeTornadoVMGPUResources
public void freeTornadoVMGPUResources()Manually releases GPU resources allocated by TornadoVM.This method can be called explicitly to free resources immediately, or will be called automatically when the model is garbage collected. It's safe to call this method multiple times.
-
close
public void close()Closes the model and releases all associated resources.This method implements AutoCloseable, allowing the model to be used with try-with-resources statements for automatic resource management.
- Specified by:
closein interfaceAutoCloseable
-