Small C++ Test.

ficedula · Apr 27, 2006

To ficedula:

Ding ding ding ding! Congratulations on being the first to note that some of these are undefined. There are two intentions here:
1: Bring about more awareness towards the types of things you should avoid while coding because the results may not be consistent across compilers.
2: Have a bit of fun by logically deducing what should happen if the ANSI C/C++ specification is not obscured by implementation. The ANSI C++ specification says explicitly that values modified twice between sequence points yield undefined results, however the actual result can be logically deduced.

Well, if by "logically deduced" you mean, "make a reasonable guess about what a particular version of a particular compiler might do", then yeah, you can do that. But undefined means just that: the compiler can legally do anything and remain spec-compliant. Can even change between compiler versions, which is why it's never any better than a guessing game.

2: For all intents and purposes, there is no need to explain that the int pointers arenâ€™t valid addresses; that much is clear at first glance. But you are correct to deduce that the specifications donâ€™t specifically define that the validity of constant pointers be verified at all during compilation (in fact doing so may lead to erroneous output across systems), so the result of constants 6 and 2 are just as valid as constants 0x100006 and 0x100002 (and both yield the same result).

Bzzzzt, wrong? I don't believe the C++ spec mandates that pointers have to be simple integers containing the raw memory address they're pointing to. The three issues here are:

1) On systems where memory operations have to be aligned, 0x6 (and 0x100006) could never, ever point to an int [of 32-bits or more], and therefore while I suspect the spec doesn't cover this, trying to calculate the difference in indices between two locations when neither location can hold the value in question is ... dodgy.

2) I also believe - although I'm not sure - that the spec says that pointer comparisons between arbitrary values are undefined! ie. you can compare pointers into a block returned from a single call to malloc/new, or you can compare pointers into an array, but comparing any two pointers is not guaranteed to do anything meaningful.
This would also mean the comparison could return literally anything.

3) On some systems, pointers actually contain more information than a memory address: for example, the low order bits contain information such as the type, size and format of data being pointed to, while the high order bits contain the actual addressing information. The comparison rule above is a direct consequence of C++ wanting to allow arbitrary schemes like this (since this is how pointers work in hardware on some CPUs!).

Again, that means you have no idea what meaning 0x2 or 0x6 have as a pointer. It's possible 0x6 is actually setting the hardware pointer flags PTR_READ_FROM_RANDOM_MEMORY_LOCATION and PTR_INVALID. Therefore you can't be sure what the comparison will return.

L. Spiro · Apr 27, 2006

Well, if by "logically deduced" you mean, "make a reasonable guess about what a particular version of a particular compiler might do", then yeah, you can do that. But undefined means just that: the compiler can legally do anything and remain spec-compliant. Can even change between compiler versions, which is why it's never any better than a guessing game.

Actually I do mean that, or at least roughly similarly.
The specification specifies when and where postfix/prefix incrementations are supposed to be applied, also stating strictly that after which point the side-effects of these applications are to be final and for further processing discarded.
The reason it states that modifying variables twice between sequence points results in undefined behavior is because both implementations and code combinations are too broad to specify exact results for every situation.
Ultimately, it lays the ground rules, but then covers its tracks by specifying that certain cases are undefined, allowing leniency towards implementation methods.
This is why I stated that, with implementation aside, following the actual guidelines set forth by the ANSI C++ specification to a T, logical deductions can be drawn for most of the problems I have given.

However, by all means, you are correct. These are deductions and no compiler is guaranteed to handle them â€œlogicallyâ€.
Even if you were sure of what to expect, you would still have to waste time testing to make sure the compiler itself does what you expect, and ultimately itâ€™s best to just avoid most of these cases altogether.

3) On some systems, pointers actually contain more information than a memory address: for example, the low order bits contain information such as the type, size and format of data being pointed to, while the high order bits contain the actual addressing information. The comparison rule above is a direct consequence of C++ wanting to allow arbitrary schemes like this (since this is how pointers work in hardware on some CPUs!).

In the best-case scenario, this is platform-specific, not part of a standard set by the ANSI C specifications. In fact this applies to #1 also.
No part of C++ dictates that memory addresses have to be aligned; this is system-dependant which means that system may purposely compile the code to keep things aligned, but is required to do so only because of its own limitations rather than limitations inherent in C++.

No system is required by C++ to use registers a certain way, stacks a certain way, or to save/use pointers a certain way; some types of hardware were created to work this way, but by their own choices.

Again, itâ€™s just not worth mentioning for my intentions in this thread.

Subtracting two pointers is the purpose of that question; nothing more.
I could have provided an example where I used malloc() to create two pointers and then subtracted them this way, and you wouldnâ€™t have had anything to say.
But instead you decided to accuse me of playing a â€œwho knows the most about C++â€ game and pick apart the nuances of my examples to make a point of your own.
And exactly what is your point?
Maybe you should be very clear on your exact intentions when you go about telling me exactly what things a person should be pointing out if he were to be playing that game.
Because accusing me of playing a â€œprovingâ€ game followed by nit-picking irrelevant details to make sure people know he is on top is just what someone should do, if he were just a tad spiteful.

For the record, I have no beef with you and I eagerly awaited your first reply, because I knew if there was one person here who would know about the tricks behind these questions, it would be you.
This test is for fun, and in the early posts it seemed as if people were actually having fun looking at these weirdo situations, so letâ€™s try to keep it fun, shall we?

L. Spiro

dziugo · Apr 27, 2006

I think that it doesn't matter if you spoiler-tag your post or not. There were too many replies to accidentally read the tip/answer by entering the thread and reading the first visible line

dziugo

ficedula · Apr 27, 2006

My point is that teaching someone that pointer arithmetic is possible in C++ is a good point to make, but not also teaching them that it's undefined except under some very specific purposes is dangerous; you wouldn't want people to go away with the idea that "hey, we can perform all sorts of arithmetic on pointers, cool!". Or at least, you also want them to know: except in these cases, doing it may or may not work and we can't be sure.

Giving somebody the tools to shoot their own foot off is often desirable, but it's nice to mention how the safety works at the same time

Same sort of idea applies with the pre-post increment operators: in my mind, the most important point to make with these under virtually any circumstances is that once you use two or more on the same variable between sequence points, the results are undefined. Not "well, it actually only does it once", but literally undefined.

(I should probably mention that the reason I feel so strongly about this is I'm currently porting our code base at work from Win32 to .NET and I've been bitten a few times by code where the person who wrote it just checked "does it work on the compiler I'm using", rather than "is this meant to work at all".)

The main reason I'm mentioning funky pointers containing non-address information is that, hey, I find that information kind of cool - how many people thought pointers were always just integers, a raw 'count' of bytes? It's sort of enlightening to realise "well, my PC does that, but..."

It's yet another thing I see people assuming all over the place, "well, pointers are just integers, right? I can just convert it to an integer and back again and it's just an address in memory..." - ie. they know enough to make it work some of the time...

L. Spiro · Apr 27, 2006

Our lines of thought arenâ€™t too far apart; there has just been a slight misunderstanding.

The questions I posed have various intentions, from illustrating short-circuit conditionals to demonstrating undefined and unpredictable results.

(I should probably mention that the reason I feel so strongly about this is I'm currently porting our code base at work from Win32 to .NET and I've been bitten a few times by code where the person who wrote it just checked "does it work on the compiler I'm using", rather than "is this meant to work at all".)

Incidentally, do you think that programmer would have coded the same way after taking this test?
People are seeing now many examples of types of code to avoid.
From the beginning people were seeing how many different results each compiler would produce, which has not only been fun and interesting, but a learning experience.

in my mind, the most important point to make with these under virtually any circumstances is that once you use two or more on the same variable between sequence points, the results are undefined. Not "well, it actually only does it once", but literally undefined.

I would like to think most people who took this topic seriously now have a very firm impression of this that will last a lifetime.
Now we have seen just exactly how dangerous it can be to disregard this rule, and more awareness towards the types of situations that can break this rule.

Likewise, a lot of programmers arenâ€™t even aware that there are expressions that yield undefined results.

Spread the knowldege, I say.

L. Spiro

ficedula · Apr 29, 2006

Incidentally, do you think that programmer would have coded the same way after taking this test?

In my mind, there are two kinds of programmers: The sort that have to be hit by a brick in order to remember something, and the sort that have to be hit by a big brick.

The first impression many programmers have on seeing information such as that presented in the test (I don't exclude myself from this) is "Whoa, cool! I'll be a good little programmer and when I use it I'll make sure to test that it does what I want it to!". And then go away satisfied that they've got the point - test your code!

Really, of course, the point is "Don't fucking do this!", but the innate coolness of an Interesting Feature prevents people from getting that. I know generally people learn better when they find things out themselves rather than being just taught, but that seems dangerous with programmers - most programmers (again, probably including myself) think they're better than they really are

I like to spell it out, preferably in big comments in the code/source repository/office memo:

USE THIS FEATURE AND I WILL HUNT YOU DOWN AND HURT YOU

(Once that's understood, sure, it's fun to have a discussion about these wacky features. I just like to get the line about hurting people in first.)

L. Spiro · Apr 29, 2006

I put up a disclaimer to thwart the evil-doers who want to try to use these tricks.

L. Spiro

Qhimm · May 1, 2006

Wah, I got here late. Let me just try the questions without reading the thread first (sorry if this looks terribly stupid if you discussed it to death already).

1. J is undefined, dependant on the compiler implementation. The standard does not specify exactly when the post-increment is performed during the expression evaluation; it may be done immediately after the retrieval of the value, or some time later up until the expression is "finished" (IIRC compilers are allowed to delay it all the way up to the ';', the end of the statement). Therefore, J can be either 60 or 61, depending on when the post-increment is applied.

2. I is 1. Had I initially been set to 1 instead of 0, there would be ambiguity for the same reason as above, though compiler writers would be likely to implement the compiler to apply the post-increment within the numerical expression evaluation, before moving on to the next part of the boolean expression. Why is this significant? I don't know, it's just ugly because of the high risk of ambiguity.

3. J is probably 2, but I think the standard doesn't guarantee that (since again it uses a variable which is also post-incremented within the same expression, and the order of events is not clearly defined). It is likely to work though, due to how boolean expressions are typically handled by compilers.

4. I'd say that the result is 1, for the same reason that (((int *)2) + 1) equals (int *)6. When offsetting pointers with integers, the integers are scaled by the pointer type size; in this case, an int, by 4 (bytes). I haven't tested it on actual compilers, but I imagine the same mechanics apply when subtracting two pointers. One significance of this kind of pointer arithmetic is that array indexing and pointer math becomes one and the same, e.g. array[5] = *(array + 5).

5. Ooh, nice. I'd have to say this is undefined as well. The post-increment can be applied within the right-hand side expression (resulting in I = 2), or it can be applied after the entire assignment expression as a whole has been evaluated (resulting in I = 3).

6. Undefined. Again, the order of evaluation/application is not specified, and additionally I haven't seen any specification of which side of comparison operators are evaluated first. If we make the assumption that the right-hand side is evaluated first, then J is 1. If the left-hand side is evaluated first, J is 0. I don't think the standard dictates either way.

7. Specify according to what? As I said I don't think the standard specifies the order and timing of post-increments, or even the order of evaluation of this kind of expression. Those indexings just become a (I++)*sizeof(structitem) + (++I)*sizeof(anotherstructitem) + offsetof(iInt), and added to the address of g_mMyStructArray, which again there's no specific order of evaluation defined. But fine, here's a best-effort attempt to guess how a current compiler would do it, if it followed the "increment as soon as possible" approach. (I++ == 0) evaluates to true, and I is incremented (1), then I's value (1) is taken, after which it is incremented again (2). g_mMyStructArray is indexed (2), I is incremented (3), I is incremented again (4) and used to index sAnotherStruct. So [2][4] = 1. Note again, however, that the compiler is well within its rights to optimize code by moving around a) where post-increments are applied, and b) the order of evaluation to calculate a final pointer offset. The only case where you can force the evaluation order of the left-hand side here is if you're using overloaded [] operators, forcing a left-to-right evaluation. Even assuming this would be the case though, the compiler is still free to hold up the post-increments until the end of the statement, resulting in

[1] = 0 (quite a different result). The only assumption I've made here is that the right-hand side of an assignment expression is typically (possibly even definedly) done first.

Now to read through the discussion, I expect some goodies. 8)

Small C++ Test.

ficedula

Guest

L. Spiro

Guest

dziugo

Guest

ficedula

Guest

L. Spiro

Guest

ficedula

Guest

L. Spiro

Guest

Qhimm

Guest