This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Tiled memory


Discussions in this group are really boring, and limit themselves 
to some obscure bugs in bash or so. Let's talk about something else.
Something new for a change.

I am adding MMX support to lcc-win32.
As you may know, the MMX introduces a SIMD parallelism to the x86 
architecture. Besides the obvious benefits of 8 bytes memory moves,
and other goodies, this parallelism feature of the new instruction 
set will be a challenge for compiler writers.

I will try to introduce the concept of a 'tiled' vector, using a
special datatype. This vectors will be handled in parallel by the
compiler, i.e. if you declare

	_tiled int vector1[1024],vector2[1024],vector3[1024];

you will be able to write something like:

	vector3 = vector1+vector2;

and the compiler will add those vectors 2 adds in parallel. The
dimensions must be right of course, and be known at compile time.

If you declare:

	_tiled short vector1[2048],vector2[2048];

You will add the 16 bits numbers 4 adds in parallel. With byte 
operations the number goes to 8 operations in parallel. You will
be able to obtain a vector of bits, comparing two strings 8 bytes
at a time (using a _tiled char).

Another new concept is the saturation operations. Using the
_saturated keyword, adds/substracts, etc will be done using saturation
arithmetic instead of normal wraparound. For instance

	_saturated char a = 150,b = 150,c;
	c = a + b;

'c' contains now 255 instead of 300-255=45 as it is now.
This operators can be combined of course.

Special variables will allow you to use directly the mmx registers.
_mm0 to _mm7 denote the mmx registers and are 64 bits wide. This
registers, aliased to the FPU registers, are NOT organized as a stack
and can be addressed individually. The datatype can be described in C as:
typedef union {
	struct {
		int high_32_63;
		int low_0_31;
	} int32;
	struct {
		short high_48_63;
		short high_32_47
		short low_16-31;
		short low_0-15;
	} int16;
	struct {
		char	high_56_63;
		char	high_48_55;
		char	high_40_47;
		char	high_32_39;
		char	low_24_31;
		char	low_16_23:
		char	low_8_15;
		char	low_0_7;
	};
} _mmxData;

Individual bytes/shorts/ints must be individually addressed to be
able to control the pack/unpack operations.

To come back to parallelism, I will borrow many concepts from the
then famous but now forgotten programming language APL. I will
introduce the vector operations as an extension of the normal operations,
and many of the APL goodies like the inner product, the outer product,
the reduce (+/ operator) etc. For instance:
	int sum = +/ vector;
This will add the vector in parallel 2/4/8 elements at a time. The
algorithm should be something like:

	_tiled vector[16];

	_mmx0 = 0;
	_mmx0 += vector[0] + vector[8];
	_mmx0 += vector[1] + vector[9];
	.....
	_mmx0 += vector[7] + vector[15];

To maximize the pipeline effect, we can use:

	_mmx0 = _mmx1 = _mmx2 ... = 0;
	_mmx0 += vector[0] + vector[8];
	_mmx1 += vector[1] + vector[9];
	...
	etc. 
The 8 mmx registers are then added together in _mmx0 at the
end of the operation. This will allow a theoretical 8 stage
pipeline.

Similar to the reduce operator we have the +\ (expand)
operator.

Suppose we have

	_tiled vector1[] = { 1 2 3 4 5 };
	vector1 = +\vector2;
	gives:
	1       3        6          10          15
	(0+1) (0+1+2) (0+1+2+3) (0+1+2+3+4) (0+1+2+3+4+5)
---------------------------------------------------------------

Well, I will stop here, I am wasting bandwidth, that would be
better used discussing /groff/termcap/vi/bash/ls/less/old.

P.S. I still see mail about 'less'. It still exists somehow, even
termcap, even if there are no terminals around for ages...

What is 'less'?
Its goal is to display a text file isn't it? 

Imagine this:

Several years ago, Xerox (who else) researchers published the
results of playing with a graphical control to display text that
presented the text to the user as a ROLL. You rolled text slowly
into view. The eye has been trained by an evolution of millions
of years to see the objects in 3 dimensions, so this text that
rolled from the back left of the screen to the center and again 
to the right gave the eye cues that eased the recognition of text.

A control that does that would be easy to write using the graphic
3D libraries that are everywhere...

Yes but how about the termcap file for that??? :-)

Have fun guys, and stop bashing bash!

-- 
Jacob Navia	Logiciels/Informatique
41 rue Maurice Ravel			Tel 01 48.23.51.44
93430 Villetaneuse 			Fax 01 48.23.95.39
France
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]