Class GPULlama3StreamingChatModel

java.lang.Object
dev.langchain4j.model.gpullama3.GPULlama3StreamingChatModel
All Implemented Interfaces:
StreamingChatModel, AutoCloseable

public class GPULlama3StreamingChatModel extends Object implements StreamingChatModel
GPULlama3 implementation of the langchain4j StreamingChatModel interface.

This model provides synchronous chat capabilities using the GPULlama3.java library, supporting both CPU and GPU execution modes. The model automatically separates thinking content from actual responses.

Example usage:

GPULlama3StreamingChatModel model = GPULlama3StreamingChatModel.builder()
    .modelPath(Paths.get("path/to/model.gguf"))
    .temperature(0.7)
    .maxTokens(2048)
    .onGPU(true)
    .build();

ChatResponse response = model.chat(chatRequest);
  • Method Details

    • builder

      public static GPULlama3StreamingChatModel.Builder builder()
    • doChat

      public void doChat(ChatRequest chatRequest, StreamingChatResponseHandler handler)
      Specified by:
      doChat in interface StreamingChatModel
    • init

      public void init(Path modelPath, Double temperature, Double topP, Integer seed, Integer maxTokens, Boolean onGPU)
    • getModel

      public org.beehive.gpullama3.model.Model getModel()
    • getSampler

      public org.beehive.gpullama3.inference.sampler.Sampler getSampler()
    • modelResponse

      public String modelResponse(ChatRequest request, IntConsumer tokenConsumer)
      Generates a chat response from the model. Used by GPULlama3StreamingChatModel.
      Parameters:
      request -
      tokenConsumer -
      Returns:
    • printLastMetrics

      public void printLastMetrics()
    • freeTornadoVMGPUResources

      public void freeTornadoVMGPUResources()
      Manually releases GPU resources allocated by TornadoVM.

      This method can be called explicitly to free resources immediately, or will be called automatically when the model is garbage collected. It's safe to call this method multiple times.

    • close

      public void close()
      Closes the model and releases all associated resources.

      This method implements AutoCloseable, allowing the model to be used with try-with-resources statements for automatic resource management.

      Specified by:
      close in interface AutoCloseable