For Programmers: Free Programming Magazines  


Home > Archive > Fortran > October 2006 > Logical handling









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Logical handling
Dieter Britz

2006-08-31, 7:00 pm

I want to handle matrices of type logical, and I reckon that
these ought to be done faster than the equivalent integer
matrices with MOD(*,2) hung on. However, on my Intel compiler,
the two came out almost exactly the same in cpu time. Clearly,
logicals are stored in whole words, rather than bits.
Is this general for compilers, or is it an Intel thing?
--
Dieter Britz, Kemisk Institut, Aarhus Universitet
Ron Shepard

2006-08-31, 7:00 pm

In article <ed6r3m$1to$2@news.net.uni-c.dk>,
Dieter Britz <britz@chem.au.dk> wrote:

> Clearly,
> logicals are stored in whole words, rather than bits.
> Is this general for compilers, or is it an Intel thing?


The language standard requires the storage unit for the default
logical, integer, and real to be the same size. However, there may
be other KIND values for logical variables that may be closer to
what you want. I don't know of any compiler that supports a one-bit
KIND, but some compilers do support one-byte KINDS for logicals and
integers.

$.02 -Ron Shepard
dpb

2006-08-31, 7:00 pm


Dieter Britz wrote:
> I want to handle matrices of type logical, and I reckon that
> these ought to be done faster than the equivalent integer
> matrices with MOD(*,2) hung on. However, on my Intel compiler,
> the two came out almost exactly the same in cpu time. Clearly,
> logicals are stored in whole words, rather than bits.
> Is this general for compilers, or is it an Intel thing?
> --


It's general on virtually any existing processor. For Fortran LOGICAL
is, as you note, equivalent in length to INTEGER and does not have an
intrinsic bit type. Since Intel processors in particular (and
virtually all in general) are word- or byte-addressable, the faster
operation(s) are on words, and addressing individual bits would, if
implemented, be more compute-intensive than the equivalent word
operation(s).

As for the difference between direct use of LOGICAL and INTEGER w/
MOD() not showing significant difference, that is probably owing to the
optimization by the compiler and the size/structure of the test(s).

Terence

2006-09-02, 7:00 pm

> It's general on virtually any existing processor. For Fortran LOGICAL
> is, as you note, equivalent in length to INTEGER and does not have an
> intrinsic bit type. Since Intel processors in particular (and
> virtually all in general) are word- or byte-addressable, the faster
> operation(s) are on words,


True so far.

>...and addressing individual bits would, if implemented,
> be more compute-intensive than the equivalent word operation(s).


This is sounds logical but is debateable.
TAU Systems and Quantum built their Market Research processing software
(since the early seventies) on using 16 (or 32) bit words, where each
bit represents a truth value. Almost every other commercial research
software uses ascii files and single to multiple character
representations of codes, on the reasoning that disk space and memory
is cheap and cpu processing is now essential instantaneous.

But it ain't always so...

In searching for conditions or patterns, the bit-wise parallel
processing of 16 (or 32) bit words is 16 (or 32) times faster and takes
up 1/16th (or 1/32th) of the space and access times whether memory or
disk. Since a charcter is usually 8 bits, not 16, the gain in total
time is 16*16/2 = 128 (or 32*32/2) = 512 times, versus a character
representation and processing. These are still important efficiency
factors.

Quantum (now SPSS) also inverts the collected sample global
person-response bit matrix to a response-person matrix, which although
time-consuming to perform that once, all further response-counting for
statistics puposes only needs access to one possible response record as
a filter and two more such records, to produce a staistical report on
demand. (The total response to a question is then just the sample size
in bits, stored as a record of bit-holding words, where each bit is a
YES from a person in person order, rounded up to the next word size
multiple and therfore giving fixed length records acessable directly).

TAU uses a bit-wise vector technique that doesn't need a x vs y
one-by-one check, and produces all reports directly, on one pass. With
today's micros you get all the hundreds of possible reports on disk in
only several seconds. The time to print is another story.

I have solved very many problems much faster, by thinking fisrt and
then often ended up using a bit-based paralled algorithm rather than a
one-by-one matrix search.

Terence Wright

Tim Prince

2006-09-02, 7:00 pm

Dieter Britz wrote:
> I want to handle matrices of type logical, and I reckon that
> these ought to be done faster than the equivalent integer
> matrices with MOD(*,2) hung on. However, on my Intel compiler,
> the two came out almost exactly the same in cpu time. Clearly,
> logicals are stored in whole words, rather than bits.
> Is this general for compilers, or is it an Intel thing?

Not only does Fortran suggest that various data types of default KIND
should occupy the same space, most architectures are designed to produce
best performance with such types, within limits.
If your matrices are large enough, and your application is such that its
performance depends on the amount of data moved around, you might find
some speedup by use of smaller sized logical KIND. I'm somewhat
surprised to find an example LOGICAL (KIND = byte) in Intel
documentation, without an explanation of where the byte parameter is
defined.
robin

2006-09-09, 7:01 pm

dpb wrote in message <1157035237.049048.266930@i3g2000cwc.googlegroups.com>...
>
>Dieter Britz wrote:
>
>It's general on virtually any existing processor. For Fortran LOGICAL
>is, as you note, equivalent in length to INTEGER and does not have an
>intrinsic bit type. Since Intel processors in particular (and
>virtually all in general) are word- or byte-addressable, the faster
>operation(s) are on words,


The faster operation(s) are actually on bytes, as fewer
bytes need to be transferred to and from memory.

> and addressing individual bits would, if
>implemented, be more compute-intensive than the equivalent word
>operation(s).
>
>As for the difference between direct use of LOGICAL and INTEGER w/
>MOD() not showing significant difference, that is probably owing to the
>optimization by the compiler and the size/structure of the test(s).


Possibly, with the compiler implements MOD by a shift or by an AND.
Using the divider would be relatively slower.

The OP could check whether a kind is available for either a 1-byte logical
or 1-byte integer is available.


Dan Nagle

2006-09-09, 7:01 pm

Hello,

robin wrote:
> dpb wrote in message <1157035237.049048.266930@i3g2000cwc.googlegroups.com>...


<snip requoted>

>
> The faster operation(s) are actually on bytes, as fewer
> bytes need to be transferred to and from memory.


On many modern processors, words are the least quantity
transfered between memory and processor. Byte manipulation
is entirely within the processor itself.

<snip the rest>

--
Cheers!

Dan Nagle
Purple Sage Computing Solutions, Inc.
Rich Townsend

2006-09-09, 10:00 pm

robin wrote:
> dpb wrote in message <1157035237.049048.266930@i3g2000cwc.googlegroups.com>...
>
> The faster operation(s) are actually on bytes, as fewer
> bytes need to be transferred to and from memory.


I'm not sure this is true. For many processors, operations on a memory-stored
byte involves fetching the word in which the byte falls, and then extracting the
byte. Therefore, a logical spanning a byte involves the same memory bandwidth as
a logical spanning a word -- and incurs the extra overhead of extracting the byte.

>
>
> Possibly, with the compiler implements MOD by a shift or by an AND.
> Using the divider would be relatively slower.
>
> The OP could check whether a kind is available for either a 1-byte logical
> or 1-byte integer is available.
>
>

Dan Nagle

2006-09-10, 7:01 pm

Hello,

Dieter Britz wrote:
> I want to handle matrices of type logical, and I reckon that
> these ought to be done faster than the equivalent integer
> matrices with MOD(*,2) hung on. However, on my Intel compiler,
> the two came out almost exactly the same in cpu time. Clearly,
> logicals are stored in whole words, rather than bits.
> Is this general for compilers, or is it an Intel thing?


Free advice, and worth every penny ... :-) (Hey, this is Usenet.)

Byte processing is done, on most modern architectures,
in the processor. For a fetch, that means extracting the byte
(usually 8-bits) from a word (usually 32-bits or 64-bits). For a store,
that means first fetching the word, next merging the byte, next
storing the word. The compiler may be smart about optimizing
the fetch/store, or maybe not.

Thus, _if_ your problem is cache-limited, you might benefit
by using byte logicals (to reduce the amount of cache used),
especially if the logicals form a mask which is more frequently
used than updated.

In other words, you must continue your benchmarking to tell.

Check your memory bus width to see how much is transferred
per bus clock. Even if bytes can be transferred, the bus will likely
be made busy by the transfer and unavailable for other use.

For matrices, also check your column size to avoid thrashing
cache line sets by trying to keep too much of the matrix
on one cache line set.

HTH

--
Cheers!

Dan Nagle
Purple Sage Computing Solutions, Inc.
Janne Blomqvist

2006-09-10, 7:01 pm

In article <ofWMg.354$Rw2.284@trnddc02>, Dan Nagle wrote:
> Hello,
>
> Dieter Britz wrote:
>
> Free advice, and worth every penny ... :-) (Hey, this is Usenet.)
>
> Byte processing is done, on most modern architectures,
> in the processor. For a fetch, that means extracting the byte
> (usually 8-bits) from a word (usually 32-bits or 64-bits). For a store,
> that means first fetching the word, next merging the byte, next
> storing the word. The compiler may be smart about optimizing
> the fetch/store, or maybe not.
>
> Thus, _if_ your problem is cache-limited, you might benefit
> by using byte logicals (to reduce the amount of cache used),
> especially if the logicals form a mask which is more frequently
> used than updated.
>
> In other words, you must continue your benchmarking to tell.
>
> Check your memory bus width to see how much is transferred
> per bus clock. Even if bytes can be transferred, the bus will likely
> be made busy by the transfer and unavailable for other use.


AFAIK, on the usual cache based architectures these days the minimum
unit for transfers to and from main memory is an entire cache
line.

--
Janne Blomqvist
Dan Nagle

2006-09-10, 7:01 pm

Hello,

Janne Blomqvist wrote:
> In article <ofWMg.354$Rw2.284@trnddc02>, Dan Nagle wrote:


<snip>

>
> AFAIK, on the usual cache based architectures these days the minimum
> unit for transfers to and from main memory is an entire cache
> line.


Yes, there are different busses between the processor (core)
and the cache, and between the cache and the main memory.
Each will (in general) have a different clock, the bus to main memory
often operates in a burst mode, transferring between all banks
of main memory and a complete cache line.

Still, it's unlikely for a modern processor or motherboard
to support single byte transfers anywhere outside the processor.
(At least, not without consuming the same resources as word transfers.)

--
Cheers!

Dan Nagle
Purple Sage Computing Solutions, Inc.
robin

2006-10-12, 7:07 pm

"Dieter Britz" <britz@chem.au.dk> wrote in message
news:ed6r3m$1to$2@news.net.uni-c.dk...
> I want to handle matrices of type logical, and I reckon that
> these ought to be done faster than the equivalent integer
> matrices with MOD(*,2) hung on. However, on my Intel compiler,
> the two came out almost exactly the same in cpu time. Clearly,
> logicals are stored in whole words, rather than bits.


That is correct.
Have you tried using byte-sized logicals?
I found that these run twice as fast as a standard integer,
using matrices of size 2,000 by 2,000

> Is this general for compilers, or is it an Intel thing?
> --
> Dieter Britz, Kemisk Institut, Aarhus Universitet



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com