Skip to main content

✍️ Create

Use the ✍️ Create endpoint to write text according to a provided prompt.

Available at

💸️ Pricing

You will be billed for the total number of tokens sent in your request plus the number of tokens generated by the API. Note that when using n_completions to return multiple possibilities, you will be charged for all of them.


curl -X 'POST' \  '' \  -H 'Content-Type: application/json' \  -H 'Accept: application/json' \  -H 'X-API-KEY: YOUR_API_KEY' \  -H 'X-Model: orion-fr' \  -d '{"text": "Il était une fois", "params": {"mode": "nucleus", "n_tokens": 25, "p": 0.9}}'
Response (JSON)
{   "request_id":"969718f5-d1f4-40cd-a872-a6fb7bf84329",   "outputs":[      [         {            "input_text":"Il était une fois",            "completions":[               {                  "output_text":" une toute petite fille, qui avait une bonne voix, et chantait des chansons américaines telles que Tabernacle et We all take",                  "score":-82.67915979400277,                  "normalized_score":-3.307166391760111,                  "token_scores":null,                  "execution_metadata":{                     "cost":25                  }               }            ],            "execution_metadata":{               "cost":25            }         }      ]   ],   "total_cost":25}


text string/array[string] ⚠️ required#

The input(s) that will be used by the model for generation, also known as the prompt. They can be provided either as a single string or as an array of strings for batch processing.

n_tokens int 20#

Number of tokens to generate. This can be overridden by a list of stop_words, which will cause generation to halt when a word in such list is encountered.

⚠️ Maximum content length

Our models can process sequences of 1,024 tokens at most (length of prompt + n_tokens). Requests overflowing this maximum length will see their prompt truncated from the left to fit.

n_completions int 1#

Number of different completion proposals to return for each prompt.

💸️ Additional costs

You will be charged for the total number of tokens generated: n_completions * n_tokens, stay reasonable!

best_of int null ⚠️ smaller than n_completions#

Among n_completions, only return the best_of ones. Completions are selected according to how likely they are, summing the log-likelihood over all tokens generated.


See the sampling entry for more details.

mode (greedy, topk, nucleus) nucleus#

How the model will decide which token to select at each step.

  • Greedy: the model will always select the most likely token. This generation mode is deterministic and only suited for applications in which there is a ground truth the model is expected to return (e.g. question answering).
  • Nucleus: the model will only consider the most likely tokens with total probability mass p. We recommend this setting for most applications.
  • Top-k: the model will only consider the k most likely tokens.

temperature float 1. ⚠️ only in topk/nucleus mode#

How risky will the model be in its choice of tokens. A temperature of 0 corresponds to greedy sampling; we recommend a value around 1 for most creative applications, and closer to 0 when a ground truth exists.

p float 0.9 ⚠️ only in nucleus mode#

Total probability mass of the most likely tokens considered when sampling in nucleus mode.

k int 5 ⚠️ only in topk mode#

Number of most likely tokens considered when sampling in top-k mode.


biases map<string, float> null#

Bias the provided words to appear more or less often in the generated text. Values should be comprised between -100 and +100, with negative values making words less likely to occur. Extreme values such as -100 will completely forbid a word, while values between 1-5 will make the word more likely to appear. We recommend playing around to find a good fit for your use case.

💡 Avoiding repetitions

When generating longer samples with biases, the model may repeat too often positively biased words. Combine this option with presence_penalty and frequency_penalty to achieve best results. If you generate a first completion, and then use it as a prompt for a new completion, you probably want to turn off the word bias encouraging a certain word once it has been produced to avoid too much repetition.

⚙️ Technical details

The provided bias is directly added to the log-likelihood predicted by the model at a given step, before performing the sampling operation. You can use the top_logprobs option or the Analyse endpoint to access the log-probabilities of samples and get an idea of the range of likelihood values in your specific use case.

The bias is actually applied at the token level, and not at the word level. For words made of multiple tokens, the bias only applies to the first token (and may thus impact other words).

presence_penalty float 0.#

How strongly should tokens be prevented from appearing again. This is a one-off penalty: tokens will be penalized after their first appearance, but not more if they appear repetitively -- use frequency_penalty if that's what you want instead. Use values between 0 and 1. Values closer to 1 encourage variation of the topics generated.

⚙️ Technical details

Once a token appears at least once, presence_penalty will be removed from its log-likelihood in the future.

frequency_penalty float 0.#

How strongly should tokens be prevented from appearing again if they have appeared repetitively. Contrary to presence_penalty, this penalty scales with how often the token already occurs. Use values between 0 and 1. Values closer to 1 discourage repetition, especially useful in combination with biases.

⚙️ Technical details

frequency_penalty * nTn_T will be removed from the log-likelihood of a token, where nTn_T is how many times it occurs in the text already.

stop_words array[string] null#

Encountering any of these words will halt generation immediately.


concat_prompt boolean false#

The original prompt will be concatenated with the generated text.

return_logprobs bool false#

Returns the log-probabilities of the generated tokens.

seed int null#

Make sampling deterministic by setting a seed used for random number generation. Useful for strictly reproducing Create calls.


skill string null#

Specify a 🤹 Skill to use to perform a specific task or to tailor the generated text.

Response (outputs)#

An array of outputs shaped like your batch.

input_text string#

The text used to generate the text.

Completions (completions)#

One entry for each n_completions requested.

output_text string#

Text generated by the model.

score float#

Total sum of the log-probabilities of the words generated.

normalized_score float#

Total sum of the log-probabilities of the words generated normalized by the length of the generated text.

token_scores map<string, float>#

Log-probability of each token generated in the completion.