chatGPT Cook Book_5 Techniques to improve reliability

How to improve reliability on complex tasks? AI could be done if you just ask general questions without any tricks. So we need to grasp tricks to make it better at working for us. Using example below to demonstrate and learn:

Use the following clues to answer the following multiple-choice question, using the following procedure:
(1) First, go through the clues one by one and and consider whether the clue is potentially relevant
(2) Second, combine the relevant clues to reason out the answer to the question
(3) Third, map the answer to one of the multiple choice answers: either (a), (b), or (c)
 
Clues:
1. Miss Scarlett was the only person in the lounge.
2. The person with the pipe was in the kitchen.
3. Colonel Mustard was the only person in the observatory.
4. Professor Plum was not in the library nor the billiard room.
5. The person with the candlestick was in the observatory.
 
Question: Was Colonel Mustard in the observatory with the candlestick?
(a) Yes; Colonel Mustard was in the observatory with the candlestick
(b) No; Colonel Mustard was not in the observatory with the candlestick
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
 
Solution:
(1) First, go through the clues one by one and and consider whether the clue is potentially relevant:
Clue 1: Miss Scarlett was the only person in the lounge. This clue is not relevant.
Clue 2: The person with the pipe was in the kitchen. This clue is not relevant.
Clue 3: Colonel Mustard was the only person in the observatory. This clue is relevant.
Clue 4: Professor Plum was not in the library nor the billiard room. This clue is not relevant.
Clue 5: The person with the candlestick was in the observatory. This clue is relevant.
 
(2) Second, combine the relevant clues to reason out the answer to the question:
Clue 3 and Clue 5 both say that the person with the candlestick was in the observatory. Colonel Mustard was the only person in the observatory, so Colonel Mustard was the person with the candlestick.
 
(3) Third, write the final answer:
(a) Yes; Colonel Mustard was in the observatory with the candlestick
Previously it was not able to give the right answer and once given step by step instruction it's able to. 
Summarize the text using the original language of the text. The summary should be one sentence long.

Text:
"""
La estadística (la forma femenina del término alemán Statistik, derivado a su vez del italiano statista, "hombre de Estado")​ es una ciencia que estudia la variabilidad, colección, organización, análisis, interpretación, y presentación de los datos, así como el proceso aleatorio que los genera siguiendo las leyes de la probabilidad.​ La estadística es una ciencia formal deductiva, con un conocimiento propio, dinámico y en continuo desarrollo obtenido a través del método científico formal. En ocasiones, las ciencias fácticas necesitan utilizar técnicas estadísticas durante su proceso de investigación factual, con el fin de obtener nuevos conocimientos basados en la experimentación y en la observación. En estos casos, la aplicación de la estadística permite el análisis de datos provenientes de una muestra representativa, que busca explicar las correlaciones y dependencias de un fenómeno físico o natural, de ocurrencia en forma aleatoria o condicional.
"""

Summary: would be lapsed to English so in instruction, say "First, identify the language of the text. Second, summarize the text using the original language of the text. The summary should be one sentence long."

Simple math problem can be significantly (from 17% to 80%) improved of accuracy by adding “let’s think step by step”. Although the Let's think step by step trick works well on math problems, it’s not effective on all tasks. The authors found that it was most helpful for multi-step arithmetic problems, symbolic reasoning problems, strategy problems, and other reasoning problems. It didn’t help with simple math problems or common sense questions, and presumably wouldn’t help with many other non-reasoning tasks either.

Using the IRS guidance below, answer the following questions using this format:
(1) For each criterion, determine whether it is met by the vehicle purchase
- {Criterion} Let's think step by step. {explanation} {yes or no, or if the question does not apply then N/A}.
(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the answer is likely {yes or no}."

IRS guidance:
"""
You may be eligible for a federal tax credit under Section 30D if you purchased a car or truck that meets the following criteria:
- Does the vehicle have at least four wheels?
- Does the vehicle weigh less than 14,000 pounds?
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged from an external source?
- Was the vehicle purchased in a year before 2022?
  - If so, has the manufacturer sold less than 200,000 qualifying vehicles? (Tesla and GM have sold more than 200,000 qualifying vehicles.)
- Was the vehicle purchased in a year after 2022?
  - If so, is the vehicle present in the following list of North American-assembled vehicles? (The only electric vehicles assembled in North America are the Audi Q5, BMW 330e, BMW X5, Chevrolet Bolt EUV, Chevrolet Bolt EV, Chrysler Pacifica PHEV, Ford Escape PHEV, Ford F Series, Ford Mustang MACH E, Ford Transit Van, GMC Hummer Pickup, GMC Hummer SUV, Jeep Grand Cherokee PHEV, Jeep Wrangler PHEV, Lincoln Aviator PHEV, Lincoln Corsair Plug-in, Lucid Air, Nissan Leaf, Rivian EDV, Rivian R1S, Rivian R1T, Tesla Model 3, Tesla Model S, Tesla Model X, Tesla Model Y, Volvo S60, BMW 330e, Bolt EV, Cadillac Lyriq, Mercedes EQS SUV, and Nissan Leaf.)
"""

Question: Can I claim a federal tax credit for my Toyota Prius Prime bought in 2021?

Solution:

(1) For each criterion, determine whether it is met by the vehicle purchase
- Does the vehicle have at least four wheels? Let's think step by step.
The Toyota Prius Prime has four wheels, so the answer is yes.
- Does the vehicle weigh less than 14,000 pounds? Let's think step by step. The Toyota Prius Prime weighs less than 14,000 pounds, so the answer is yes.
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged from an external source? Let's think step by step. The Toyota Prius Prime has a battery with at least 4 kilowatt hours that may be recharged from an external source, so the answer is yes.
- Was the vehicle purchased in a year before 2022? Let's think step by step. The Toyota Prius Prime was purchased in 2021, which is before 2022, so the answer is yes.
- Was the vehicle purchased in a year after 2022? N/A
- If so, is the vehicle present in the following list of North American-assembled vehicles? N/A

(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the answer is likely {yes or no}."

Because the Toyota Prius Prime meets all of the criteria for a federal tax credit, the answer is likely yes.

Chain of thought/few-shot approach, in steps between, write the blue highlighted part.

In addition to doing poorly on long reasoning chains (where selection-inference shines), chain-of-thought prompting can especially struggle when the examples are short but the task is long.

Research into large language models is very active and evolving rapidly. Not only do researchers continue to improve the models, they are also continue to improve our understanding of how to best employ the models.

Write a unit test prompt example:

I

In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.
```python
def is_palindrome(s):
    return s == s[::-1]
```

Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.
- First,{GENERATED IN STEP 1}

A good unit test suite should aim to:
- Test the function's behavior for a wide range of possible inputs
- Test edge cases that the author may not have foreseen
- Take advantage of the features of `pytest` to make the tests easy to write and maintain
- Be easy to read and understand, with clean code and descriptive names
- Be deterministic, so that the tests always pass or fail in the same way

`pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.

For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):
-{GENERATED IN STEP 2}

[OPTIONALLY APPENDED]In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):
-{GENERATED IN STEP 2B}

Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.
```python
import pytest  # used for our unit tests

def is_palindrome(s):
    return s == s[::-1]

#Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator
{GENERATED IN STEP 3}

Actual output in my chatGPT environment(parameters by default)

The function is_palindrome takes in a string as an input and returns a Boolean indicating whether the input string is a palindrome or not. A palindrome is a word, phrase, number, or other sequence of characters that reads the same forward and backward, ignoring spaces, punctuation, and capitalization. The function uses string slicing to reverse the input string and compares it to the original string to check if it’s a palindrome.

A good unit test suite for this function would include test cases for a wide range of possible inputs, including edge cases that the author may not have foreseen. The test suite should also take advantage of the features of pytest to make the tests easy to write and maintain, be easy to read and understand, with clean code and descriptive names, and be deterministic, so that the tests always pass or fail in the same way.

Some examples of diverse scenarios that the unit tests should handle include:

  • Testing the function’s behavior with valid palindrome strings of different lengths and characters
  • Testing the function’s behavior with non-palindrome strings
  • Testing the function’s behavior with empty strings
  • Testing the function’s behavior with strings that only contain spaces or special characters

Some examples of edge cases that should be considered include:

  • Testing the function’s behavior with strings that have leading or trailing spaces
  • Testing the function’s behavior with strings that contain numbers and special characters
  • Testing the function’s behavior with strings that have different capitalization

A possible complete suite of unit tests that uses pytest‘s @pytest.mark.parametrize decorator and test cases passed as tuple can look like this:

Copy codeimport pytest

def is_palindrome(s):
    return s == s[::-1]

# Test valid palindrome strings
@pytest.mark.parametrize("test_input,expected", [
    ("madam", True),
    ("racecar", True),
    ("level", True)
])
def test_valid_palindrome(test_input, expected):
    assert is_palindrome(test_input) == expected

# Test non-palindrome strings
@pytest.mark.parametrize("test_input,expected", [
    ("hello", False),
    ("world", False),
    ("python", False)
])
def test_non_palindrome(test_input, expected):
    assert is_palindrome(test_input) == expected

# Test empty string
def test_empty_string():
    assert is_palindrome("") == False

# Test strings that only contain spaces or special characters
@pytest.mark.parametrize("test_input,expected", [
    ("  ", False),
    ("!", False),
    ("@", False)
])
def test_string_with_only_spaces_or_special_chars(test_input, expected):
    assert is_palindrome(test_input) == expected

# Test strings that have leading or trailing spaces
@pytest.mark.parametrize("test_input,expected", [
    (" madam", True),
    ("racecar ", True),
    (" level ", True)
])
def test_string_with_leading_or_trailing_spaces(test_input, expected):
    assert is_palindrome(test_input)

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.