If you're on this page, you're probably doing some research on B.F. Skinner and his work on operant conditioning! You might be surprised to see how much conditioning you go through each day! We are conditioned to behave in certain ways every day. Our brains naturally gravitate toward the things that bring us pleasure and back away from things that bring us pain. When we connect our behaviors to pleasure and pain, we become conditioned.
When people are subjected to reinforcements (pleasure) and punishments (pain,) they are undergoing operant conditioning. This article will describe operant conditioning, how it works, and how different schedules of reinforcement can increase the rate that subjects perform a certain behavior.
What is Operant Conditioning?
Operant conditioning is a system of learning that happens by changing external variables called 'punishments' and 'rewards'. Throughout time and repetition, learning happens when an association is created between a certain behavior and the consequence of that behavior (good or bad).
You might also hear this concept described as “instrumental conditioning” or “Skinnerian conditioning.” This second term comes from BF Skinner, the behaviorist who discovered operant conditioning through this work with pigeons.
He created what is now known as the “Skinner box.” The box contained a lever, disc, or other sort of mechanism. When the levers were pulled or the discs were pressed, something would occur. Food would appear, lights would flash, the floor would become electric, etc.
Skinner placed pigeons inside these boxes to record their responses based on whether or not they were conditioned to the responses that occurred after completing a certain task.
Based on how the pigeons understood the consequences of their actions, and changes to their behavior, Skinner developed the idea of operant conditioning.
How Does Operant Conditioning Work?
We can unearth the definition of operant conditioning by breaking it down. Skinner defined an operant as any "active behavior that operates upon the environment to generate consequences." Let’s say you get a big hug every time you tell your mother that she looks pretty. That compliment is an operant.
In operant conditioning, you can change two variables to achieve two goals.
The variables you can change are adding a stimulus or removing a stimulus.
The goals you can achieve are increasing a behavior or decreasing a behavior.
Depending on what goal you're trying to achieve, and how you manipulate the variable, there are four methods of operant conditioning:
- Positive Reinforcement
- Negative Reinforcement
- Positive Punishment
- Negative Punishment
Trying to remember the types of operant conditioning can be difficult, but here's a simple cheat-sheet to help you.
Reinforcement is increasing a behavior.
Punishment is decreasing a behavior.
The positive prefix means you're adding a stimulus.
The negative prefix means you're removing the stimulus.
Positive reinforcement sounds redundant - isn’t all reinforcement positive? In psychology, the word “positive” doesn’t exactly mean what you think it means. The term “positive reinforcement” simply refers to the idea that you have added stimulus in order to try to increase a behavior. Dessert after finishing your chores is positive reinforcement.
Negative reinforcement is the removal of a stimulus to reinforce a behavior. It’s not always a negative experience. Removing debt from your account is considered negative reinforcement. A night without chores is also negative reinforcement.
Under the umbrella of negative reinforcement are two concepts: escape and active avoidance. These types of negative reinforcement condition your behavior through the threat or existence of a “bad” stimuli.
Escape occurs when a subject “escapes” a bad stimulus. In early experiments concerning learned helplessness, Martin Seligman put dogs in a room and subjected them to recurring shocks. If the dogs crossed to the other side of the room, they wouldn’t be shocked anymore. This is a form of escape - the subject can escape bad stimuli with their behaviors.
Active Avoidance Learning
If you walked out into the cold without a coat, you would be faced with a punishment: you’re freezing and uncomfortable! The next time you go outside and wear a coat, you feel comfortable and warm. The behavior of putting on the coat allows you to actively avoid the “bad stimulus” and encourages you to wear a coat.
This example shows that not all forms of operant conditioning are due to someone’s intentions or manipulation. We learn to avoid or invite naturally-occurring stimuli based on what we observe in the aftermath of our behaviors.
Before I move onto the next form of operant conditioning, let me just sum up reinforcement. Remember, all types of reinforcement encourage you to repeat the actions that led to that reinforcement.
In Operant Conditioning, Punishment is described as changing a stimulus to decrease the likelihood of a behavior. Like reinforcement, there are two types of punishment: positive and negative.
Positive punishment is not a positive experience - it discourages the subject from repeating their behaviors through the addition of stimulus.
In The Big Bang Theory, Sheldon and the gang try and devise a plan for them to avoid getting off-topic. They decide to introduce a positive punishment to discourage that behavior.
The characters decide to put pieces of duct tape on their arms. When one of them gets off-topic, another person in the group would rip the duct tape of that person’s arm as a form of operant conditioning. The addition of that painful feeling makes their scheme a form of positive punishment.
Negative punishment takes something away from the subject to help discourage behavior. If your parents ever took away your access to video games or toys because you were behaving badly, they were using negative punishment to discourage you from bad behavior.
Measuring Response and Extinction Rates
Getting spanked for bad behavior once is not going to stop you from trying to get away with bad behavior. Feeling cold outside and warmer once you put on a coat is not going to teach you to put on a coat every time you go outside.
Researchers use two measurements to determine the effectiveness of different operant conditioning schedules: response rate and extinction rate.
The Response Rate is how often the subject performs the behavior in order to receive the reinforcement.
The Extinction Rate is quite different. If the subject doesn’t trust that they will get a reinforcement for their behavior, or does not make the connection between the behavior and the consequence, they are likely to quit performing the behavior. The rate of extinction is the rate at which that behavior ends after reinforcements are not given.
Schedules of Reinforcement
How fast does operant conditioning happen? Can you manipulate response and extinction rates? The answer varies based on when and why you receive your reinforcement.
Skinner understood this. Throughout his research, he observed that the timing and frequency of reinforcement or punishment made a big impact on how quickly the subject learned to perform or refrain from a behavior. These factors also make an impact on response rate.
The different times and frequencies in which reinforcement is delivered can be identified by one of many schedules of reinforcement. Let’s look at those different schedules and how effective they are.
If you think about the simplest form of operant conditioning, you are probably thinking of continuous reinforcement. When the subject performs a behavior, they earn a reinforcement. This occurs every single time.
While response rate is fairly high in the beginning, extinction occurs as soon as the continuous reinforcement stops. If you earn dessert every single time you clean your room, you will clean your room when you want dessert. But if one day, you clean your room and don’t earn dessert, you will lose trust in the reinforcement and the behavior is likely to stop.
The next four reinforcement schedules are called partial reinforcement. Reinforcements are not delivered every single time a behavior is performed. Instead, reinforcements are distributed based on the amount of behaviors performed or the amount of time that passes.
Fixed ratio reinforcement
“Ratio” refers to the amount of responses. “Fixed” refers to a consistent amount. Put them together and you get a schedule of reinforcement with a consistent amount of responses. Rewards programs often used fixed ratio reinforcement schedules to encourage customers to keep coming back. For every ten smoothies, you get one free.
Every time you spend $100, you get $20 off on your next purchase. The free smoothie and reduced purchases are both reinforcements distributed after a consistent amount of behaviors. It could take a subject two years or two weeks to reach that tenth smoothie - either way, the reinforcement is distributed after that tenth purchase.
The rate of response becomes more rapid as subjects endure fixed ratio reinforcement. Think about people in sales who work on commission. They know that they will get an $1,000 paycheck for every five items they sell - you can bet that they are pushing hard to sell those five items and earn that reinforcement faster.
Fixed interval reinforcement
Whereas “ratio” refers to the amount of responses, “interval” refers to the timing of the response. Subjects receive reinforcement after a certain amount of time has passed. If you get a paycheck on the 15th and 30th of every month, you are subject to fixed interval reinforcement. It doesn’t matter how many times you perform a behavior.
The response rate is typically slower in situations with fixed interval reinforcement. Subjects know that they will receive a reward no matter how often they perform a behavior. Often, people in jobs with steady and consistent paychecks are less likely to push hard and sell more product because they know they will get the same paycheck no matter how many items they sell. Other factors, like bonuses or verbal reprimands, may impact their motivation, but those extra factors don’t exist in pure fixed interval reinforcement.
Variable ratio reinforcement
When we talk about reinforcement schedules, “variable” refers to something that varies after a reinforcement is given.
Let’s go back to the example of the rewards card. On a variable ratio reinforcement schedule, the subject would receive their first free smoothie after buying ten smoothies. Once they get that first free smoothie, they only have to buy seven smoothies to get another free smoothie. After that reinforcement is distributed, the subject has to buy 15 smoothies to get a free smoothie. The ratio of reinforcement is variable.
This type of schedule isn’t always used because it can be confusing - in many cases, the subject does not know how many smoothies they have to purchase before they get their free one.
However, response rates are high for this type of schedule. The reinforcement is dependent on the subject’s behavior. By performing one more behavior, they know they are one step closer to their reward. If they don’t get the reinforcement, they can perform one more behavior and again become one step closer to getting the reinforcement.
Think of slot machines. You never know how many times you will need to pull the level before you win the jackpot. But you know that with every pull, you are one step closer to winning. At some point, if you just keep pulling over and over, you will win the jackpot and receive a big reinforcement.
Variable interval reinforcement
The final reinforcement schedule identified by Skinner was that of variable interval reinforcement. By now, you can probably guess what this means. Variable interval reinforcement occurs when reinforcements are distributed after a certain amount of time has passed, but this amount varies after each reinforcement is distributed.
In this example, let’s say you work at a retail store. At any given time, secret shoppers enter the store. If you manage to perform the correct behaviors and sell the right items to the secret shopper, the higher-ups give you a bonus.
This could happen at any time as long as you are performing the behavior. This type of schedule keeps people on their toes, encouraging a high response rate and low extinction rate.
FAQs About Operant Conditioning
Is Operant Conditioning Trial and Error?
Not exactly, although trial and error helped psychologists recognize operant conditioning. Through trial and error, it was discovered that reinforcements and rewards helped behaviors stick. These reinforcements (praise, treats, etc.) are the key to behaviors being performed and even repeated.
Is Operant Conditioning Behaviorism?
Behaviorism is an approach to psychology; think of operant conditioning as a theory under the umbrella of behaviorism. B.F. Skinner is considered one of the most important Behaviorists in the history of psychology. Theories like operant conditioning and classical conditioning have helped shape how people approach behavior for decades.
Differences Between Operant Conditioning vs. Classical Conditioning
Classical conditioning ties existing behaviors (like salivating) to stimuli (like a bell). “Classical Connects.” Operant conditioning trains an animal or human to perform or refrain from certain behaviors. You don’t train a dog to salivate, but you can train a dog to sit by giving him treats when he sits.
Operant Conditioning vs. Instrumental Conditioning
Operant conditioning and instrumental conditioning are two terms for the same process. You are more likely to hear the term "operant conditioning" in psychology and "instrumental conditioning" in economics! They do, however, differ from another type of conditioning: classical conditioning.
Can Operant Conditioning Be Used in the Classroom?
Yes! Intentionally rewarding students for their behavior is a form of operant conditioning. If a student receives praise every time they get an A, for example, they are more likely to strive for an A on their tests and quizzes.
Everyday Examples of Operant Conditioning
You can probably think of ways that you have used operant conditioning on yourself, your child, or your pets! Reddit users see operant conditioning in video games, pet training, ...
When you think about FFBE, what's the first thing that comes to mind? Most of you would probably answer CRYSTALS, PULLING, RAINBOWS, EVE! That's a clear example of Operant Conditioning. You wanna play the game every day and get that daily summon, because you know that you may get something awesome! And that's also the reason why the Rainbow rates are low -- if you won them too frequently, it would lose its effect.
Now that Mary knows the basketball player is in the game for fame, she uses this to her advantage. Every time he does something desirable, she uses this as a reinforcement for him to not only continue this behavior but also upgrade it. After their first date went well, they went to an event together. She knows he wants adulation and to feel important, so she puts the spotlight on him and makes him look good in front of others every time he goes out of his way to provide for her. This subconsciously makes him feel good, so he continues to provide her what she wants and needs (in her case, gifts, money, and affection.)
Using Operant Conditioning On Yourself
We are used to forms of operant conditioning set up either by the natural world or by authority figures. But you can also use operant conditioning on yourself or with an accountabilibuddy.
Here’s how you can do it yourself. You set up a fixed ratio reinforcement schedule: for every 10 note cards that you write or memorize, you give yourself an hour of video games. You can set up a fixed interval reinforcement schedule: after every week of finals, you take a vacation.
Accountabilibuddies are best for setting up variable ratio and variable interval reinforcement schedules. That way, you don’t know when the reinforcement is coming. Tell your buddy to give you your video game controller back after a random amount of note cards that you write. Or, ask them to walk into your room at random intervals of time. If you’re studying, they hand you a beer. If you’re not, no reinforcement.