Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM.
import requests
import json
url ="https://api.cyfuture.ai/aiapi/inferencing/response"payload ={"model":"Model Name","max_tokens":16384,"top_p":1,"top_k":40,"presence_penalty":0,"frequency_penalty":0,"temperature":0.6,"messages":[{"role":"user","content":"Hello, how are you?"}]}headers ={"Accept":"application/json","Content-Type":"application/json","Authorization":"Bearer <API_KEY>"}requests.request("POST", url, headers=headers, data=json.dumps(payload))
awaitfetch("https://api.cyfuture.ai/aiapi/inferencing/response",{ method:"POST", headers:{"Accept":"application/json","Content-Type":"application/json","Authorization":"Bearer <API_KEY>"}, body:JSON.stringify({ model:""Model Name"", max_tokens:16384, top_p:1, top_k:40, presence_penalty:0, frequency_penalty:0, temperature:0.6, messages:[{ role:"user", content:"Hello, how are you?"}]})});
URI uri = URI.create("https://api.cyfuture.ai/aiapi/inferencing/response");HttpClient client =HttpClient.newHttpClient();HttpRequest request =HttpRequest.newBuilder().uri(uri).header("Accept","application/json").header("Content-Type","application/json").header("Authorization","Bearer <API_KEY>").POST(HttpRequest.BodyPublishers.ofString("""{"model":""Model Name"","max_tokens":16384,"top_p":1,"top_k":40,"presence_penalty":0,"frequency_penalty":0,"temperature":0.6,"messages":[{"role":"user","content":"Hello, how are you?"}]}""")).build();HttpResponse<String> response = client.send(request,HttpResponse.BodyHandlers.ofString());
On-demand deployments allow you to use Llama 2 70B on dedicated GPUs with Cyfuture Ai' high-performance serving stack with high reliability and no rate limits.