Real World Technologies - Forums - Thread: Branch/jump target prediction

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), August 10, 2016 8:44 pm

Heikki Kultala (hkultala.delete@this.iki.fi) on August 9, 2016 12:37 pm wrote:
>
> > A question that was raised, however, is about fixed length archs - can these architectures avoid use of BTB
> > entries for fixed-target branches by decoding such jumps early and redirecting fetch? That is, avoiding the
> > use of the BTB for branches whose targets are fixed in the
> > instruction, leaving the BTB resources for branches
> > which may actually vary (e.g., indirect jumps). Do any of the common fixed-length archs actually do this?
>
> Modern pipelines are way too long for this, the branch cannot be decoded
> early enough when just the i-cache access takes multipl clock cycles.

For instruction address relative jumps/branches using largish offsets/insets, the block of instructions can be predecoded and the fetch chunk index of a possibly taken jump/branch stored in a smaller, faster portion of Icache. If there is not a possibly taken jump/branch in a chunk of instructions, the bits would be taken from a common location in the chunk. (For MIPS and Alpha this would also provide Icache space for a "BTB entry" for indirect jumps; Alpha even architects some of the bits as a hint.) This requires at least one extra bit per chunk (to encode whether an instruction address is encoded or the bits are taken from the common location; obviously with more state bits other fast-access information choices might be made available, e.g., stack load offset).

With 16-bit offsets and 32-bit instructions, presumably a 2-cycle Icache could be converted to support one 16-bit fast-access portion per two instructions while providing single cycle latency.

This concept could be extended to short branch encodings (requiring additional metadata) and variable length instruction encodings (fetch chunk crossing branch instructions would be more difficult to handle, cache block crossing even more so).

This concept (which I had posted years ago on comp.arch and which someone mentioned had been considered by Alpha designers) is sort of between a BTB and an aggressively predecoded instruction cache. (It is not a BTB in the traditional sense because it is mostly using instruction storage, expanding storage as part of predecode is not unusual so it might be classified as a predecoded instruction cache.) It is also a use of subblock NUCA, where part of a cache block has lower latency (with another obvious use being for storing pointers or enough of a pointer to generate a cache address).

> And even the only 5-stage pipeline of most of the "original"
> RISC processors needed the one delay slot for this.

Except for MIPS jumps, the offset addition would take some time. Without branch direction prediction, one would still need to perform the branch evaluation. One also would need to sufficiently decode the instruction to know that it was a control flow instruction (and what type). If branches had been encoded as inset with implicit carry/borrow and metadata provided for quick branch recognition, then a delay slot might not have been necessary.

(Mitch Alsup once suggested on comp.arch using predecode to provide microarchitectural delayed branches.)

< Previous Post in Thread

Next Post in Thread >

Topic	Posted By	Date
Branch/jump target prediction	Travis	2016/08/09 10:44 AM
Early decode of unconditional jumps	Peter Cordes	2016/08/09 12:35 PM
Early decode of unconditional jumps	Exophase	2016/08/09 01:29 PM
pipelines are too long, no	Heikki Kultala	2016/08/09 12:37 PM
pipelines are too long, no	no name	2016/08/09 07:17 PM
pipelines are too long, no	Wilco	2016/08/10 02:43 AM
pipelines are too long, no	Paul A. Clayton	2016/08/10 08:44 PM
Converged BTB/Icache	Paul A. Clayton	2016/08/10 08:44 PM
Branch/jump target prediction	sylt	2016/08/10 03:27 AM
Branch/jump target prediction	Peter Cordes	2016/08/12 04:23 PM
Branch/jump target prediction	sylt	2016/08/12 11:35 PM
Branch/jump target prediction	Mr. Camel	2016/08/10 10:43 AM
Branch/jump target prediction	Linus Torvalds	2016/08/10 12:46 PM
Branch/jump target prediction	Megol	2016/08/10 03:25 PM
Branch/jump target prediction	Linus Torvalds	2016/08/10 05:14 PM
Branch/jump target prediction	David Kanter	2016/08/12 12:09 AM
Branch/jump target prediction	Linus Torvalds	2016/08/12 12:25 PM
Branch/jump target prediction	⚛	2016/08/14 05:24 AM
Branch/jump target prediction	Maynard Handley	2016/08/14 07:47 AM
Branch/jump target prediction	David Kanter	2016/08/14 08:13 AM
Branch/jump target prediction	⚛	2016/08/16 06:19 AM
Branch/jump target prediction	Tim McCaffrey	2016/08/14 08:12 AM
Branch/jump target prediction	David Kanter	2016/08/14 08:18 AM
Branch/jump target prediction	Gabriele Svelto	2016/08/14 02:09 PM
Just a thought	Anon	2016/08/14 10:40 AM
Just a thought	⚛	2016/08/16 06:58 AM
Just a thought	Anon	2016/08/16 08:45 AM
Just a thought	⚛	2016/08/16 09:36 AM
Branch/jump target prediction	Linus Torvalds	2016/08/14 10:40 AM
Branch/jump target prediction	⚛	2016/08/16 06:40 AM
Branch/jump target prediction	Ricardo B	2016/08/16 07:39 AM
Branch/jump target prediction -8	⚛	2016/08/16 09:23 AM
Branch/jump target prediction -8	anon	2016/08/16 10:09 AM
Branch/jump target prediction -8	Ricardo B	2016/08/16 10:33 AM
Branch/jump target prediction -8	Exophase	2016/08/16 11:02 AM
Branch/jump target prediction -8	Ricardo B	2016/08/16 11:31 AM
SPU hbr instruction (hint for branch)	vvid	2016/08/16 12:31 PM
Branch/jump target prediction -8	no name	2016/08/17 08:16 AM
Branch/jump target prediction -8	Gabriele Svelto	2016/08/16 11:46 AM
Branch/jump target prediction -8	Etienne	2016/08/17 01:27 AM
Branch/jump target prediction -8	Gabriele Svelto	2016/08/17 03:52 AM
Branch/jump target prediction -8	Maynard Handley	2016/08/18 10:02 AM
Branch/jump target prediction -8	⚛	2016/08/18 06:21 PM
Branch/jump target prediction -8	Maynard Handley	2016/08/18 07:27 PM
Branch/jump target prediction -8	Megol	2016/08/19 04:29 AM
Part 1/N - CPU-internal JIT	⚛	2016/08/19 04:44 AM
Atom, you're such a comedian.	Jim Trent	2016/08/18 10:39 PM
Atom, you're such a comedian.	⚛	2016/08/19 03:23 AM
Branch/jump target prediction -8	Etienne	2016/08/19 01:25 AM
Branch/jump target prediction -8	Simon Farnsworth	2016/08/19 04:17 AM
Branch/jump target prediction -8	Michael S	2016/08/19 06:39 AM
Branch/jump target prediction -8	anon	2016/08/19 07:29 AM
Branch/jump target prediction -8	Simon Farnsworth	2016/08/19 08:34 AM
Branch/jump target prediction -8	anon	2016/08/19 08:48 AM
Branch/jump target prediction -8	Exophase	2016/08/19 11:03 AM
Branch/jump target prediction -8	Maynard Handley	2016/08/19 11:34 AM
Branch/jump target prediction -8	David Kanter	2016/08/20 12:23 AM
Branch/jump target prediction -8	Ricardo B	2016/08/19 07:18 AM
Branch/jump target prediction -8	Maynard Handley	2016/08/19 08:41 AM
Branch/jump target prediction -8	Michael S	2016/08/19 09:26 AM
Branch/jump target prediction -8	Maynard Handley	2016/08/19 01:47 PM
Branch/jump target prediction -8	Michael S	2016/08/21 01:53 AM
Branch/jump target prediction -8	Ricardo B	2016/08/22 05:17 AM
Branch/jump target prediction -8	Michael S	2016/08/22 05:58 AM
Branch/jump target prediction -8	Ricardo B	2016/08/22 07:50 AM
Branch/jump target prediction -8	Simon Farnsworth	2016/08/19 09:28 AM
Branch/jump target prediction -8	Simon Farnsworth	2016/08/19 09:40 AM
Branch/jump target prediction -8	David Kanter	2016/08/23 12:05 AM
Branch/jump target prediction -8	Maynard Handley	2016/08/23 07:49 AM
Branch/jump target prediction -8	anon	2016/08/26 08:00 AM
Branch/jump target prediction -8	anon	2016/08/26 08:14 AM
Branch/jump target prediction	Megol	2016/08/19 04:23 AM
Branch/jump target prediction	Megol	2016/08/19 07:42 AM
Branch/jump target prediction	Maynard Handley	2016/08/19 11:46 AM
Branch/jump target prediction	David Kanter	2016/08/20 12:34 AM
Branch/jump target prediction	Maynard Handley	2016/08/20 07:07 AM
Branch/jump target prediction	sylt	2016/08/19 11:48 AM
Branch/jump target prediction	sylt	2016/08/19 12:00 PM
Branch/jump target prediction	Megol	2016/08/21 10:27 AM
The (apparent) state of trace caches on modern CPUs	Maynard Handley	2016/08/22 03:10 PM
The (apparent) state of trace caches on modern CPUs	Exophase	2016/08/22 08:55 PM
The (apparent) state of trace caches on modern CPUs	anon	2016/08/23 12:36 AM
The (apparent) state of trace caches on modern CPUs	Exophase	2016/08/23 05:08 AM
The (apparent) state of trace caches on modern CPUs	anon	2016/08/23 09:51 PM
The (apparent) state of trace caches on modern CPUs	Exophase	2016/08/23 11:12 PM
The (apparent) state of trace caches on modern CPUs	Maynard Handley	2016/08/24 07:38 AM
The (apparent) state of trace caches on modern CPUs	anon	2016/08/24 08:26 PM
The (apparent) state of trace caches on modern CPUs	Maynard Handley	2016/08/23 07:48 AM
That's not true	David Kanter	2016/08/23 09:39 AM
That's not true	Maynard Handley	2016/08/23 09:56 AM
The (apparent) state of trace caches on modern CPUs	anon	2016/08/23 09:54 PM
The (wrong) state of trace caches on modern CPUs	Eric Bron	2016/08/25 02:38 AM
The (wrong) state of trace caches on modern CPUs	Michael S	2016/08/25 03:28 AM
The (wrong) state of trace caches on modern CPUs	Eric Bron	2016/08/25 07:12 AM
The (wrong) state of trace caches on modern CPUs	Maynard Handley	2016/08/25 09:50 AM
The (wrong) state of trace caches on modern CPUs	Michael S	2016/08/25 10:36 AM
The (wrong) state of trace caches on modern CPUs	Exophase	2016/08/25 11:32 AM
The (wrong) state of trace caches on modern CPUs	Eric Bron	2016/08/25 11:12 AM
The (wrong) state of trace caches on modern CPUs	Maynard Handley	2016/08/25 12:01 PM
The (wrong) state of trace caches on modern CPUs	Eric Bron	2016/08/25 12:20 PM
The (wrong) state of trace caches on modern CPUs	Maynard Handley	2016/08/25 01:34 PM
Branch/jump target prediction	Gabriele Svelto	2016/08/11 01:15 PM
Branch/jump target prediction	Gabriele Svelto	2016/08/20 07:21 AM

Reply to this Topic
Name:
Email:
Topic:
Body:	No Text Paul A. Clayton (paaronclayton.delete@this.gmail.com) on August 10, 2016 8:44 pm wrote: > Heikki Kultala (hkultala.delete@this.iki.fi) on August 9, 2016 12:37 pm wrote: > > > > > A question that was raised, however, is about fixed length archs - can these architectures avoid use of BTB > > > entries for fixed-target branches by decoding such jumps early and redirecting fetch? That is, avoiding the > > > use of the BTB for branches whose targets are fixed in the > > > instruction, leaving the BTB resources for branches > > > which may actually vary (e.g., indirect jumps). Do any of the common fixed-length archs actually do this? > > > > Modern pipelines are way too long for this, the branch cannot be decoded > > early enough when just the i-cache access takes multipl clock cycles. > > For instruction address relative jumps/branches using largish offsets/insets, the block of instructions > can be predecoded and the fetch chunk index of a possibly taken jump/branch stored in a smaller, > faster portion of Icache. If there is not a possibly taken jump/branch in a chunk of instructions, > the bits would be taken from a common location in the chunk. (For MIPS and Alpha this would also > provide Icache space for a "BTB entry" for indirect jumps; Alpha even architects some of the bits > as a hint.) This requires at least one extra bit per chunk (to encode whether an instruction address > is encoded or the bits are taken from the common location; obviously with more state bits other fast-access > information choices might be made available, e.g., stack load offset). > > With 16-bit offsets and 32-bit instructions, presumably a 2-cycle Icache could be converted to support > one 16-bit fast-access portion per two instructions while providing single cycle latency. > > This concept could be extended to short branch encodings (requiring additional metadata) > and variable length instruction encodings (fetch chunk crossing branch instructions > would be more difficult to handle, cache block crossing even more so). > > This concept (which I had posted years ago on comp.arch and which someone mentioned had been considered > by Alpha designers) is sort of between a BTB and an aggressively predecoded instruction cache. (It > is not a BTB in the traditional sense because it is mostly using instruction storage, expanding storage > as part of predecode is not unusual so it might be classified as a predecoded instruction cache.) > It is also a use of subblock NUCA, where part of a cache block has lower latency (with another obvious > use being for storing pointers or enough of a pointer to generate a cache address). > > > And even the only 5-stage pipeline of most of the "original" > > RISC processors needed the one delay slot for this. > > Except for MIPS jumps, the offset addition would take some time. Without branch direction > prediction, one would still need to perform the branch evaluation. One also would need to > sufficiently decode the instruction to know that it was a control flow instruction (and what > type). If branches had been encoded as inset with implicit carry/borrow and metadata provided > for quick branch recognition, then a delay slot might not have been necessary. > > (Mitch Alsup once suggested on comp.arch using predecode to provide microarchitectural delayed branches.)
How do you spell koala? 🐨

Converged BTB/Icache

Editor’s Picks

Why Apple Won’t ARM the MacBook

3D Integration: A Revolution in Design

Intel’s Sandy Bridge Microarchitecture

RWT on Twitter