If you're on this page, you're probably doing some research on B.F. Skinner and his work on Operant Conditioning and wanting to learn more.
Throughout our lives, we are conditioned to behave in certain ways. Our brains naturally gravitate toward the things that bring us pleasure and back away from things that bring us pain. When we connect our behaviors to pleasure and pain, we become conditioned.
When people are subjected to reinforcements (pleasure) and punishments (pain,) they are undergoing operant conditioning. This article will describe operant conditioning, how it works, and how different schedules of reinforcement can increase the rate that subjects perform a certain behavior.
What is Operant Conditioning?
Operant conditioning is a system of learning that happens by changing external variables called 'punishments' and 'rewards'. Throughout time and repetition, learning happens when an association is created between a certain behavior and the consequence of that behavior (good or bad).
You might also hear this concept described as “instrumental conditioning” or “Skinnerian conditioning.” This second term comes from BF Skinner, the behaviorist who discovered operant conditioning through this work with pigeons.
He created what is now known as the “Skinner box.” The box contained a lever, disc, or other sort of mechanism. When the levers were pulled or the discs were pressed, something would occur. Food would appear, lights would flash, the floor would become electric, etc.
Skinner placed pigeons inside these boxes to record their responses based on whether or not they were conditioned to the responses that occurred after completing a certain task.
Based on how the pigeons understood the consequences of their actions, and changes to their behavior, Skinner developed the idea of operant conditioning.
We can unearth the definition of operant conditioning by breaking it down. Skinner defined an operant as any "active behavior that operates upon the environment to generate consequences."
Here’s a quick example. You tell your mother that she looks pretty. She responds by giving you a nice, big hug. You are encouraged to continue complimenting your mother, assuming you will get a nice hug after you do so.
In operant conditioning, you can change two variables to achieve two goals.
The variables you can change are adding a stimulus or removing a stimulus.
The goals you can achieve are increasing a behavior or decreasing a behavior.
Depending on what goal you're trying to achieve, and how you manipulate the variable, there are 4 methods of operant conditioning:
- Positive Reinforcement
- Negative Reinforcement
- Positive Punishment
- Negative Punishment
Trying to remember the types of operant conditioning can be difficult, but here's a simple cheat-sheet to help you.
Reinforcement is adding a stimulus.
Punishment is removing a stimulus.
The positive prefix means you're trying to increase the behavior.
The negative prefix means you're trying to decrease the behavior.
Positive reinforcement sounds redundant - isn’t all reinforcement positive? In psychology, the word “positive” doesn’t exactly mean what you think it means. The term “positive reinforcement” simply refers to the idea that you have added stimulus in order to try to increase a behavior. Dessert after finishing your chores is positive reinforcement.
Negative reinforcement is the removal of a stimulus to reinforce a behavior. It’s not always a negative experience. Removing debt from your account is considered negative reinforcement. A night without chores is also negative reinforcement.
Under the umbrella of negative reinforcement are two concepts: escape and active avoidance. These types of negative reinforcement condition your behavior through the threat or existence of a “bad” stimuli.
Escape occurs when a subject “escapes” a bad stimulus. In early experiments concerning learned helplessness, Martin Seligmann put dogs in a room and subjected them to recurring shocks. If the dogs crossed to the other side of the room, they wouldn’t be shocked anymore. This is a form of escape - the subject can escape bad stimuli with their behaviors.
Active Avoidance Learning
If you walked out into the cold without a coat, you would be faced with a punishment: you’re freezing and uncomfortable! The next time you go outside and wear a coat, you feel comfortable and warm. The behavior of putting on the coat allows you to actively avoid the “bad stimulus” and encourages you to wear a coat.
This example shows that not all forms of operant conditioning are due to someone’s intentions or manipulation. We learn to avoid or invite naturally-occurring stimuli based on what we observe in the aftermath of our behaviors.
Before I move onto the next form of operant conditioning, let me just sum up reinforcement. Remember, all types of reinforcement encourage you to repeat the actions that led to that reinforcement.
In Operant Conditioning, Punishment is described as changing a stimulus to decrease the likelihood of a behavior. Like reinforcement, there are two types of punishment: positive and negative.
Positive punishment is not a positive experience - it discourages the subject from repeating their behaviors through the addition of stimulus.
In The Big Bang Theory, Sheldon and the gang try and devise a plan for them to avoid getting off-topic. They decide to introduce a positive punishment to discourage that behavior.
The characters decide to put pieces of duct tape on their arms. When one of them gets off-topic, another person in the group would rip the duct tape of that person’s arm as a form of operant conditioning. The addition of that painful feeling makes their scheme a form of positive punishment.
Negative punishment takes something away from the subject to help discourage behavior. If your parents ever took away your access to video games or toys because you were behaving badly, they were using negative punishment to discourage you from bad behavior.
Measuring Response and Extinction Rates
Getting spanked for bad behavior once is not going to stop you from trying to get away with bad behavior. Feeling cold outside and warmer once you put on a coat is not going to teach you to put on a coat every time you go outside.
Researchers use two measurements to determine the effectiveness of different operant conditioning schedules: response rate and extinction rate.
The Response Rate is how often the subject performs the behavior in order to receive the reinforcement.
The Extinction Rate is quite different. If the subject doesn’t trust that they will get a reinforcement for their behavior, or does not make the connection between the behavior and the consequence, they are likely to quit performing the behavior. The rate of extinction is the rate at which that behavior ends after reinforcements are not given.
How fast does operant conditioning happen? Can you manipulate response and extinction rates? The answer varies based on when and why you recieve your reinforcement.
Skinner understood this. Throughout his research, he observed that the timing and frequency of reinforcement or punishment made a big impact on how quickly the subject learned to perform or refrain from a behavior. These factors also make an impact on response rate.
The different times and frequencies in which reinforcement is delivered can be identified by one of many schedules of reinforcement. Let’s look at those different schedules and how effective they are.
If you think about the simplest form of operant conditioning, you are probably thinking of continuous reinforcement. When the subject performs a behavior, they earn a reinforcement. This occurs every single time.
While response rate is fairly high in the beginning, extinction occurs as soon as the continuous reinforcement stops. If you earn dessert every single time you clean your room, you will clean your room when you want dessert. But if one day, you clean your room and don’t earn dessert, you will lose trust in the reinforcement and the behavior is likely to stop.
The next four reinforcement schedules are called partial reinforcement. Reinforcements are not delivered every single time a behavior is performed. Instead, reinforcements are distributed based on the amount of behaviors performed or the amount of time that passes.
Fixed ratio reinforcement
“Ratio” refers to the amount of responses. “Fixed” refers to a consistent amount. Put them together and you get a schedule of reinforcement with a consistent amount of responses. Rewards programs often used fixed ratio reinforcement schedules to encourage customers to keep coming back. For every ten smoothies, you get one free.
Every time you spend $100, you get $20 off on your next purchase. The free smoothie and reduced purchases are both reinforcements distributed after a consistent amount of behaviors. It could take a subject two years or two weeks to reach that tenth smoothie - either way, the reinforcement is distributed after that tenth purchase.
The rate of response becomes more rapid as subjects endure fixed ratio reinforcement. Think about people in sales who work on commission. They know that they will get an $1,000 paycheck for every five items they sell - you can bet that they are pushing hard to sell those five items and earn that reinforcement faster.
Fixed interval reinforcement
Whereas “ratio” refers to the amount of responses, “interval” refers to the timing of the response. Subjects receive reinforcement after a certain amount of time has passed. If you get a paycheck on the 15th and 30th of every month, you are subject to fixed interval reinforcement. It doesn’t matter how many times you perform a behavior.
The response rate is typically slower in situations with fixed interval reinforcement. Subjects know that they will receive a reward no matter how often they perform a behavior. Often, people in jobs with steady and consistent paychecks are less likely to push hard and sell more product because they know they will get the same paycheck no matter how many items they sell. Other factors, like bonuses or verbal reprimands, may impact their motivation, but those extra factors don’t exist in pure fixed interval reinforcement.
Variable ratio reinforcement
When we talk about reinforcement schedules, “variable” refers to something that varies after a reinforcement is given.
Let’s go back to the example of the rewards card. On a variable ratio reinforcement schedule, the subject would receive their first free smoothie after buying ten smoothies. Once they get that first free smoothie, they only have to buy seven smoothies to get another free smoothie. After that reinforcement is distributed, the subject has to buy 15 smoothies to get a free smoothie. The ratio of reinforcement is variable.
This type of schedule isn’t always used because it can be confusing - in many cases, the subject does not know how many smoothies they have to purchase before they get their free one.
However, response rates are high for this type of schedule. The reinforcement is dependent on the subject’s behavior. By performing one more behavior, they know they are one step closer to their reward. If they don’t get the reinforcement, they can perform one more behavior and again become one step closer to getting the reinforcement.
Think of slot machines. You never know how many times you will need to pull the level before you win the jackpot. But you know that with every pull, you are one step closer to winning. At some point, if you just keep pulling over and over, you will win the jackpot and receive a big reinforcement.
Variable interval reinforcement
The final reinforcement schedule identified by Skinner was that of variable interval reinforcement. By now, you can probably guess what this means. Variable interval reinforcement occurs when reinforcements are distributed after a certain amount of time has passed, but this amount varies after each reinforcement is distributed.
In this example, let’s say you work at a retail store. At any given time, secret shoppers enter the store. If you manage to perform the correct behaviors and sell the right items to the secret shopper, the higher-ups give you a bonus.
This could happen at any time as long as you are performing the behavior. This type of schedule keeps people on their toes, encouraging a high response rate and low extinction rate.
Operant Conditioning vs. Classical Conditioning
This process sounds very similar to classical conditioning that was used in Pavlov’s dogs experiments. You might be wondering 'What's the difference between operant and classical conditioning?':
Classical conditioning ties an existing behavior (like salivating) to a stimulus (like the bell). Remember: Classical Connects.
Operant conditioning, however, trains an animal or human to perform or refrain from certain behaviors. You don’t train a dog to salivate. But you can train a dog to sit by giving him a treat every time he sits.
You Can Use Operant Conditioning In Your Life
We are used to forms of operant conditioning set up either by the natural world or by authority figures. But you can also use operant conditioning on yourself or with an accountabilibuddy.
Here’s how you can do it yourself. You set up a fixed ratio reinforcement schedule: for every 10 note cards that you write or memorize, you give yourself an hour of video games. You can set up a fixed interval reinforcement schedule: after every week of finals, you take a vacation.
Accountabilibuddies are best for setting up variable ratio and variable interval reinforcement schedules. That way, you don’t know when the reinforcement is coming. Tell your buddy to give you your video game controller back after a random amount of note cards that you write. Or, ask them to walk into your room at random intervals of time. If you’re studying, they hand you a beer. If you’re not, no reinforcement.