Class GPULlama3StreamingChatModel
java.lang.Object
dev.langchain4j.model.gpullama3.GPULlama3StreamingChatModel
- All Implemented Interfaces:
StreamingChatModel, AutoCloseable
GPULlama3 implementation of the langchain4j StreamingChatModel interface.
This model provides synchronous chat capabilities using the GPULlama3.java library, supporting both CPU and GPU execution modes. The model automatically separates thinking content from actual responses.
Example usage:
GPULlama3StreamingChatModel model = GPULlama3StreamingChatModel.builder()
.modelPath(Paths.get("path/to/model.gguf"))
.temperature(0.7)
.maxTokens(2048)
.onGPU(true)
.build();
ChatResponse response = model.chat(chatRequest);
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionbuilder()
void
close()
Closes the model and releases all associated resources.void
doChat
(ChatRequest chatRequest, StreamingChatResponseHandler handler) void
Manually releases GPU resources allocated by TornadoVM.org.beehive.gpullama3.model.Model
getModel()
org.beehive.gpullama3.inference.sampler.Sampler
void
init
(Path modelPath, Double temperature, Double topP, Integer seed, Integer maxTokens, Boolean onGPU) modelResponse
(ChatRequest request, IntConsumer tokenConsumer) Generates a chat response from the model.void
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface StreamingChatModel
chat, chat, chat, defaultRequestParameters, listeners, provider, supportedCapabilities
-
Method Details
-
builder
-
doChat
- Specified by:
doChat
in interfaceStreamingChatModel
-
init
-
getModel
public org.beehive.gpullama3.model.Model getModel() -
getSampler
public org.beehive.gpullama3.inference.sampler.Sampler getSampler() -
modelResponse
Generates a chat response from the model. Used by GPULlama3StreamingChatModel.- Parameters:
request
-tokenConsumer
-- Returns:
-
printLastMetrics
public void printLastMetrics() -
freeTornadoVMGPUResources
public void freeTornadoVMGPUResources()Manually releases GPU resources allocated by TornadoVM.This method can be called explicitly to free resources immediately, or will be called automatically when the model is garbage collected. It's safe to call this method multiple times.
-
close
public void close()Closes the model and releases all associated resources.This method implements AutoCloseable, allowing the model to be used with try-with-resources statements for automatic resource management.
- Specified by:
close
in interfaceAutoCloseable
-