Common Concerns and Misconceptions Surrounding Reward-Based Training
It’s something I hear a lot as a dog trainer who uses positive reinforcement based techniques - Does this mean I have to give him a treat every time? How long do I have to carry treats with me? What if I don’t have any treats when X happens? There’s no way I can give him a treat that’s better than a dead squirrel carcass! These are legitimate concerns. I, too, don’t want to feel like I have to give my dog a cookie every time he sits for the rest of his life. That sounds hard to carry out and extremely limiting. Understanding the principles of learning and how positive reinforcement works should be empowering, but the idea of being so controlled by whether or not you have a treat in your hand is anything but empowering. If I truly understand the science of learning and how to read my learner, I realize that the real power isn’t just in those bits of hotdog.
What Is Positive Reinforcement?
I’m going to blow your mind here: positive reinforcement does not equal treats. Yes, treats are going to be the reinforcer I reach for most often for training a new behavior. This is for a couple of reasons. One, they’re easy to implement. I can dole them out efficiently, exactly when and where I want them. Two, dogs like them. (But Jill, my dog isn’t food motivated. Well, actually, your dog is living and breathing, so he is, but more on that later.) A necessary element for something to BE reinforcement is for the learner to find it desirable. The sciency term for this is appetitive stimulus. If I know that the dog will do stuff for the bits of hotdog, then that’s where I’m going to start. So, treats can equal reinforcement, but they’re not the only thing.
Positive reinforcement is the application of a desirable stimulus that increases the probability of a behavior. Anything the dog likes, has a natural inclination toward, can be that desirable stimulus. When we define it this way, we see that we are not just limited to treats. Toys, access to freedom, affection, words of praise, getting to say hi to another person or dog, sniffing, running, chasing squirrels - all of these things can be reinforcers. All of these things are usually referred to as environmental reinforcers. These can, and should, be a part of positive reinforcement training. Once I have established a new behavior using treats, I’m often able to find an environmental reinforcer to maintain the behavior. For example, I will train my dog to walk nicely at my side using treats in the beginning. Eventually, I will be able to just utilize the pleasurable experience of getting to move forward or the opportunity to sniff a bush as reinforcement for loose leash walking. No more need to carry treats on every walk.
When I have established a behavior by means of positive reinforcement, I have built what’s called a reinforcement history for that behavior. I have ingrained in my dog’s brain that when he hears the cue to sit, and he does it, there is very likely going to be a reinforcer available to him. A new neural pathway now exists between perceiving the cue, doing the behavior, and experiencing good things. The power of reinforcement history is that, when I have built it, I’m not just relying on what I have in my hands at this moment to reward my dog. What I’m relying on is the mountains of times doing this behavior has paid off in the past. In fact, just the opportunity to hear and respond to a cue can be a reinforcing experience. For my visual learners, the arrow below represents the strength of the behavior over time as reinforcers are employed. Notice that the size of the reinforcer stays the same, but the "size" of the behavior builds.
What’s so great about this type of dog training, and what sets it apart from other methods, is that very fact. I can get my dog to walk on a loose leash even if I’m not waving chicken in front of his face. If I trained my leash manners using a prong collar, or even something as innocuous as a front-clip harness, as soon as I take that tool off, my training disappears. The dog will pull. If a behavior is trained using punishment or an aversive stimulus, it will be very quickly forgotten when that threat goes away. This is not found to be true of behaviors trained with positive reinforcement.
Learn How To Be a Slot Machine
So no, you will not need to give your dog a treat every time he sits. In fact, if you don’t, you’re actually doing yourself a favor. If my dog knows that a treat didn’t come this time, but maybe it will next time, he will keep trying until that behavior pays off. In our human world, we see this play out with slot machines. Gambling isn’t addictive because you get a payout every time you pull the lever. Casinos wouldn’t exist if that was how they worked. This entire industry is built around this concept of variable reinforcement schedules. I don’t know when I’ll hit the next jackpot, but I hit it once before so maybe I’ll hit it again, and that unpredictability can have me sitting in that chair, inserting quarter after quarter until my cup is empty.
Unfortunately, humans have a hard time implementing this type of reinforcement schedule, and this is often where we see positive reinforcement training fall short.
Where it Goes Wrong
Where I most often see clients struggle is when the treat comes out at the wrong time. Classic example is this: I say, Jack, come here! Jack stares. I remember that I need to reinforce my recall, so I pull out the hunk of cheese I have in my treat pouch. Jack comes running. What’s happening here is not positive reinforcement, or even dog training. It is bribery. Very quickly, Jack learns, “Ah-ha! When Jill says ‘come,’ if I stay very still, I can train her to produce cheese. Good human!” Then what happens when I need Jack to come and I do not have cheese with me? Jack has no idea what I’m talking about, he’s learned that “come” means “wait until you see cheese.” No cheese, no come. In order for the training to work, it must be behavior first, then reinforcer.
What's Not Reinforced Will Go Away
Eventually you’ve built the reinforcement history you need, you’ve switched to a variable schedule of reinforcement to fade out the treats, and you’ve learned how to recognize and utilize other reinforcers to maintain that established behavior. Great job! But it’s important to also recognize that some tasks, some behaviors, are too challenging or too important not to reinforce heavily every time.
If we look at a human example, consider the arrival of your paycheck every two weeks. You work your butt off every single day for that one reward that you know is coming, and that’s why you’ll work for the next two weeks and the two weeks after that. Now imagine that payday rolls around and nothing enters your bank account. You find out there was an error in the system, your check will be a few days late. Understandable, things happen. But two more weeks go by and that paycheck never comes. You’re probably not going to keep doing the work for much longer. Depending on how much or how little you enjoy the work itself, you may stop doing the job even sooner than that.
Some behaviors are too important to stop paying for. There are some things that I cannot risk deteriorating. I will always give Jack a huge squeeze of peanut butter for coming when called because that behavior is essential to his safety and quality of life.
Treats are a part of positive reinforcement training, but they’re not everything.
Reinforcement history is powerful in maintaining behaviors once learned.
Variable reinforcement schedules are the slot machines of dog training.
If it comes before the behavior, it’s bribery, not reinforcement.
Are there some things that I don’t always pay my dog for? Yes. Are there some behaviors that I will always pay my dog for? Yes.