[GiM logo] gim.org.pl is down || odświeżony jogger (v.0.4) GiMa

Truly saying I'm not much interested in C++0x (or rather C++1x :P).
Today I've stumbled upon few examples of delegates:

  1. [](int x, int y) -> int { int z = x + y; return z + x; }
  2. [&, foo](int x) { sum += x; prod *= foo; }

In the second code & refers to 'full closure' (so sum and prod are variables from that closure, accessed by reference), and foo is copied from the closure by value (i.e. any changes to foo inside the delegate won't be seen outside the delegate).

Truly saying, I don't mind it, but ...
C++ already has awful syntax (you can argue, but that's a fact :P)
Am I wrong or is the proposal (?) of delegates' syntax are is just über-crappish?

If you look at above samples and your answer is nope, here's one more for you:

  1. [&](int* foo, int i){i[foo] += (bar->*baz)(i); }

P.S. Your comments will be appreciated

catz: [eng.lish] [kom.puterowe] [pro.gramowanie] [Techblog]
tagz: [] [c++0x] [c++1x] [syntax-horror]
dnia sobota, 03 październik 2009, 104040 by Michał 'GiM' Spadliński

Komentarze:

Proszę wpisy pisane po angielsku komentować również w tym języku.

In any given language you can write unreadable code and as such your last example is not quit fair in the respect of array indexing (that is, use foo[i] instead of i[foo] and code gets clearer).

The other non-obvious construct -- pointers to methods -- is so rarely used that again I'm not sure if using it as an example is entirely fair.

Having said that, I must admit that I agree to some extend with the point -- C++'s syntax is in fact somehow clobbered but in my opinion there are other more important aspects of the language that are worth more attention with templates as a perfect example of construct witch has design flaws no one will ever be able to fix without breaking backward compatibility.

dnia sobota, 03 październik 2009, 131952 by mina86

For one thing, I'd never want to a C++ compilers, let alone using the language itself.

dnia sobota, 03 październik 2009, 134950 by Dodek

@mina86: ok, I know it's not fair, but try to forget about semantic for a while and just *look* how crappy this looks (think of someone who doesn't have C/C++ background).

regarding member-function-pointer, they are used, they'd probably be used lot more if it's design wasn't so fscked up (how did it happend, they didn't do anything about that?)

as for backward compatibility: I truly have no freaking idea, why do the Comitee (Bjarne?) wanted backward compatibility.

I'd prefer breaking changes that truly *improve* the language, and different extension of a file or some kind of marker at the begining like I don't know:
#pragma C++0x
(So that compilers could still compile classic C++ code and C++0x when it's wanted)

dnia sobota, 03 październik 2009, 214911 by GiM

Someone who does not have C/C++ background usually does not need to look at the code? If code involves such mechanisms as pointers to methods and lambda expressions it is very likely the person won't be able to understand the code even if the syntax was simple.

And since I've already touched topic of pointers to methods, I think they are not used because functors and virtual methods are used instead. For instance, I never felt the need to use a pointer to method and I rarely used pointers to functions and that only in C (my favourite is to implement state machine with them where each function is machine's state).

And as the last thing, when breaking backward compatibility problems arise when you try to merge (read include) different files written for different versions of the language. It is not a big issue for languages with relatively small code-base -- benefits are greater then the cost of rewriting existing software -- but it's not the case with C++ (or C for that matter).

And I don't even want to mention the fact that one'd have to know all the versions and remember which one apply to each portion of the code. The point here is that if you once learn that some construct means something then it won't change in future versions of the language.

To sum things up, I don't think C++ has such a bad syntax -- sure, it's not something I'd hang on the wall (yes, I'd prefer Jessica Alba's poster) but then again I prefer colon over "extends" or braces over "begin"/"end" pair. You can write unreadable code in any language and as long as syntax is not totally screwed (think BrainF*ck) you can write readable code.

dnia sobota, 03 październik 2009, 230711 by mina86

mina86, but the problem with pointer to methods is more complicated than it seems it is. It's yet another facility that nobody uses. When C++ designers came across a problem, they frequently came up with two solutions, and when they could not decide which one to apply, they did them both. In result, we have STL with half of the function being useless - my favourite example is for_each: without boost::lambda the advantage over simple for loop is nonexistent, and boost is another level of bloat. We also have "facilities" which are in fact attempts to fix broken design. We have const_iterator, which enables "const iterators". However, if STL wanted to be consistent, it would also define volatile_iterator and const_volatile_iterator, and then nobody would ever bother to look at STL.

90% of users use up to 10% of C++, when you try to go further from C subset, the amount of bloat grows exponentially.

C++ hasn't bad syntax? It's the only "popular" language with context-sensitive syntax, how's that not bad? Have you ever tried to write a C++ parser? What does "Foo Bar(baz < xyzzy, bar > frub);" mean? Thing like ":" vs "extends" or "begin" vs "{" do not really matter here.

And regarding your first paragraph - you can use lambda expression and still have simple syntax. Just look at MLs, let alone Lisp, which has so simple syntax that you could say it has no syntax at all. Pointer to methods? Let's see:

http://paste.lisp.org/display/88117

Of course, this do not really resemble "method pointers" from C++, because C++ object system is not the thing that Alan Kay would call object system, but I digress.

dnia sobota, 03 październik 2009, 234204 by Dodek

@mina86: Although I don't agree with first argument I don't want to argue.

There are truly lots of pointer-to-member uses. They are used many times in both boost and stl. They are used in different places in MFC.
They'd probably be used on a daily basis by most programmers if it's design wasn't so flawed and if they hadn't got horrible syntax (truly unusable without macros).

I must say I quite don't get your third point. If such division really existed, there shouldn't be a problem linking normal C++ stuff with new C++0x stuff. Templates could be problematic, but you've already mentioned they're flawed.

As for the last thing, Brainf* has much more clearer and conciese *edit* syntax ;)


@Dodek: you will probably agree that boost is one big hack-around-C++, I don't know what's your opinione, but imo boost's _1 looks creepy :]

As for C++ parser, that's what I was having in mind, when writing examples, C++0x is just another step to make it even more context-sensitive ;)

dnia niedziela, 04 październik 2009, 001729 by GiM

Regarding pointers to methods, I'd guess they have been added because something like pointer to function existed -- it was a natural decision. If we can point a function why can't we point a method? That would be inconsistent and broken. So, the fact that both virtual methods and pointers to methods were created is, in my opinion, natural choice -- one could not have done it differently.

Also, their design is not flawed at all -- granted, ->* and .* looks a bit creepy (but you can get used to it) and declaration of such pointers even more (but usage of typedefs makes thinks easier) but other then that it's pretty straightforward. They are not used because in 99,9% of cases virtual methods and functors suffice. And I can't think of any place where they are used in STL, sorry, so if you have some particular code in mind please share.

And since we're talking about unused syntax it doesn't matter that 90% of programmers use 10% of the language features -- it's the case with most of the languages with big standard library.

Also, for_each is accompanied with a set of templates which make it somehow usable -- bind and company makes it somehow usable. Having said that, I do agree that without lambda it's still not what it could have been but that's what the new standard fixes by introducing lambda expressions.

And since we are so deep in STL lets look at volatile_iterator. There's only one thing I can say about it: It has no sense at all. I cannot imagine sane person using volatile keyword with STL's containers and if one were to define her own container which would point to a portion of volatile memory trivial iterator and const_iterator would behave correctly -- there is no reason to allow casting from (const_)volatile_iterator to (const_)iterator.

But let's get back to syntax. First of all Python is another popular context-sensitive language so C++ is not the only one. This however is in my opinion not that important, ie. I'm not concerned about parser's complexity that much.

Also, I never claimed that lambda expressions cannot have "clean" (for some definitions of the word "clean") syntax -- all I'm saying is that someone without C/C++ background has no use reading complex C++ code and someone with such background will understand the code after a while.

And as of using code written with different versions of the language -- clearly linking is not the problem here, you can link code written in assembler, C, C++, D, Pascal and so on so clearly various versions of C++ would be able to be linked together. Compilation is the problem here. For one thing parser would get bloated since it'd had to support various syntaxes but let's ignore that. Bigger problem would be if you'd want to use class/method/template/lambda defined in C++v1 header in C++v2 file.

dnia niedziela, 04 październik 2009, 103437 by mina86

"And since we're talking about unused syntax it doesn't matter that 90% of programmers use 10% of the language features -- it's the case with most of the languages with big standard library."

What are we talking about here? Syntax or standard library? STL is not that big, but yet some of the parts are still useless - using higher order functions like bind makes it even more useless - for_each is supposed to shorten one's code, not make it longer. And regarding syntax - yes, bloated syntax is a real problem. If I use 10% of provided functionality, it's fine, but when it comes to reading another person's code, who also uses 10% of functionality, albeit not intersecting with the part you know. It also greatly lengthens process of learning language, can be cause of more bugs, etc etc.

"And since we are so deep in STL lets look at volatile_iterator. There's only one thing I can say about it: It has no sense at all."

But it shows some inherent problem with C++ type system and iterators. It would make much more sense if one could use "const iterator" instead of "const_iterator", but unfortunately language design makes it impossible - nobody had in mind STL when C++ was first defined. Instead, we need to define its own iterator for every keyword combination.

Of course const, the way it is, is yet another problematic feature of C++, it causes problem both when you use it and when you don't.

" First of all Python is another popular context-sensitive language so C++ is not the only one."

In fact, you're wrong. Go figure. And it's kind of ironic that you bring small, clean and simple syntax of Python to face C++ Godzilla.

"all I'm saying is that someone without C/C++ background has no use reading complex C++ code and someone with such background will understand the code after a while."

But almost all of us know at least some C, and yet this syntax is creepy. Also, using lambda isn't really complex, considering it is the basis of functional programming. However, I agree you can't make lambda look consistent with C++, you can make it consistent with some part of C++, since all the parts are not really consistent with themselves.

I just think the C++0x will fuck up the remaining bit of C++, and then no new programmer will bother to look at it.

dnia niedziela, 04 październik 2009, 105854 by Dodek

You brought STL so I started referring to it. If you want to stick to syntax then I don't agree that only small portion of it is used by most programmers. Yes, they are keyword like and, or, etc. which are used by no one but other then that most of the syntax is used.

The issue with const_iterator is not so big. Granted, it would be nice if notion of const iterators were somehow built-in the language but honestly it's not something that strikes me as a big flaw or something. Just a nuisance.

And no! const is not a problematic feature if one knows how to use it. If one doesn't then there's no salvation for her anyway but if you use const correctly it can save you a lot of trouble.

As of Python -- it is in fact context sensitive since it depends on number of spaces at the beginning of the line.

"However, I agree you can't make lambda look consistent with C++, you can make it consistent with some part of C++, since all the parts are not really consistent with themselves." -- I have no idea what you've just said.

dnia niedziela, 04 październik 2009, 112218 by mina86

"As of Python -- it is in fact context sensitive since it depends on number of spaces at the beginning of the line."

That's not what context sensitivity is. There are no spaces after lexing. Again, go figure.

dnia niedziela, 04 październik 2009, 112527 by Dodek

Clearly (dot represent a space; Markdown is disabled?)

if foo:
....bar
....baz

is something different then

if foo:
....foo
bar

so how is it not context-sensitive? It doesn't matter if lexer handles the context -- at some point the spaces have to be interpreted.

dnia niedziela, 04 październik 2009, 113949 by mina86

Please, for the fuck's sake, read what context-sensitive grammar is.

dnia niedziela, 04 październik 2009, 114044 by Dodek

I'm well aware what context sensitive language is -- it is a language which grammar cannot be described using only "T -> whatever" rules. If you are claiming Python is context insensitive please do provide context insensitive grammar for an if statement.

dnia niedziela, 04 październik 2009, 114314 by mina86

/* terminals = {int_literal, indent, dedent, <, >, =, +, -, *, /, if, :, else}
entry point = toplevel_stmt */

op_code = < | > | = | + | - | * | /
expression = int_literal | int_literal op_code int_literal
expressions = expression | expression expressions
if_stmt = if expression : indent expressions dedent
| if expression : indent expressions dedent else : indent expressions dedent
toplevel_stmt = if_stmt | expressions

Oh please.

dnia niedziela, 04 październik 2009, 115002 by Dodek

The only problem with your grammar is that it does not apply to a plain/text source code. You assumed existence of a lexer which produces indent and dedent tokens and therefore hid the context-sensitivity of the grammar in the indent/dedent tokens. Using the very same tricks I can say I have a clever C++ lexer which produces different tokens for '<' and '>' depending on whether they are operators or angle-parens used in templates.

dnia niedziela, 04 październik 2009, 123801 by mina86

Who the fuck talks about grammars with regard to source code? Grammar is applied to the stream of terminal symbols, produced by lexer. Please, stop this nonsense.

dnia niedziela, 04 październik 2009, 124135 by Dodek

Not at all. A source code is given as an input to compiler or interpreter which parses it. So to input is stream of characters not a stream of tokens.

As I've written before, if you really insist on your understanding then C++ is not a context-sensitive language if you have a lexer which produces symbols such as LT or TPL_OPEN for "<" and GT or TPL_CLOSE for ">" depending on context.

It's true that when you consider lexer and parser as two separate entities you can construct a context-insensitive parser for any kind of language but then lexer has to handle the context -- there's no way around it, at some point context will have to be dealt with.

So, to "stop this nonsense" you have to choose from either of: (i) both C++ and Python are context-sensitive or (ii) none of C++ or Pythong is context-sensitive.

dnia niedziela, 04 październik 2009, 134130 by mina86

You still don't get it. Context-sensitiveness is not something to be believed in, like you wish it were (considering your last paragraph) - it's well-defined mathematical term, which you seem to have no idea about, other way you would use "context-free" instead of "context-insensitive". Amd if you do not believe me and refuse to do any research, please, ask anyone who you think knows the topic better than both me and you - everyone will tell you that the C++ is the context-sensitive one.

"Not at all. A source code is given as an input to compiler or interpreter which parses it. So to input is stream of characters not a stream of tokens."

We're talking about formal grammars here. Read the definition of formal grammar - there isn't any "source code" there.

I'm done here.

dnia niedziela, 04 październik 2009, 140350 by Dodek

The fact that I've used "context-insensitive" only proves I'm not English language guru.

Also, I never used "source code" in context of grammar -- what I've written is that compiler/interpreter's input is source code and as such input is stream of characters not stream of tokens. Tokens are created after lexer handles the characters.

Understanding of what context-sensitive grammar is is not an issue here. Problem is you try to apply the definition to a stream of tokens rather then stream of characters where you should.

I'll write it for the third time: Assuming I have a C++ lexer which produces LT, GT, TPL_OPEN, TPL_CLOSE and all the usual tokens I can produce a context-free grammar for C++ parser, ergo (applying your understanding) C++ is context-free.

dnia niedziela, 04 październik 2009, 141507 by mina86

"The fact that I've used "context-insensitive" only proves I'm not English language guru."

No. "Context-insensitive" is perfectly valid English, the thing is this term is never used with regard to grammars - instead, we use "context-free". It proves that you did not know what context-sensitiveness was - otherwise you would at least encounter the "context-free" term.

" Problem is you try to apply the definition to a stream of tokens rather then stream of characters where you should."

You CANNOT use the "context-sensitive" term with regard to source code, because it has only meaningful sense when talking about grammars - and grammars are about terminals, not characters.

" Assuming I have a C++ lexer which produces LT, GT, TPL_OPEN, TPL_CLOSE and all the usual tokens I can produce a context-free grammar for C++ parse"

The problem is that you can't write lexer doing that. In order to know, which '<' is "less than" and which opens a template, you need to parse the source code first. Having done that, you don't have a lexer anymore. You have a parser.

Please, read about formal language theory.

dnia niedziela, 04 październik 2009, 142351 by Dodek

Your logic is amassing. Why do you assume that I learned about grammar from English sources? One can have perfect understanding of formal grammar without the need to understand English and so read English sources therefore not encountering the "context-free" term.

Whether or not I used the correct term is irrelevant here and proves nothing. I assure you, if the discussion had been in Polish I would have used the correct term.

And once again, I never used "context-sensitive" with regard to source code as such. All the time I'm saying about stream of terminals such that the set of terminals is set of Unicode characters.

And finally, riddle me this, how Python lexer knows where to put "indent" and "dedent" tokens? It needs to keep track of indention of previous lines and compare it with indention of current line therefore it *handles context*.

dnia niedziela, 04 październik 2009, 143928 by mina86

Python uses LL(1) context-free grammar (it's even one of dev-team goals to never exceed LL(1) parser complexity;
see http://docs.python.org/reference/grammar.html, CPython sources, PEP3099).

Just sayin'.

dnia niedziela, 04 październik 2009, 145405 by PiotrLegnica

Maybe so, but it's possible only because lexer handles the context by providing INDENT and DEDENT tokens therefore lexer itself is context-sensitive.

For instance, there are following characters on lexer's input: :, new line, space, space, space, space, 'i', 'f', space. What tokens will lexer produce?

If language's grammar and lexer used were context-free one would be able to answer above question but because INDENT and DEDENT tokens depend on history of previous lines' indention (ie. context) one cannot answer above question.

dnia niedziela, 04 październik 2009, 150523 by mina86

So are we talking about language grammar, or about how does the lexer produce tokens? Because the former doesn't really care about the latter.

dnia niedziela, 04 październik 2009, 151533 by PiotrLegnica

About language grammar of course but keep in mind the source file is not a stream of tokens produced by a lexer -- it's a stream of (say Unicode) characters.

You cannot take parser and say it's language's grammar because it's not the case. How numbers are encoded, can you write 0xff or not, what "\n" in a string means as well as how comments are specified are all parts of language grammar and thy are usually handled by the lexer.

If you want to ignore lexer then I say that a language which looks similar to Python but instead of indention uses "{" and "}" characters (which produce INDENT and DEDENT symbols) has identical grammar to Python.

dnia niedziela, 04 październik 2009, 152325 by mina86

But we are talking about grammars here. Don't come up with some silly excuse that file is not a stream of tokens - grammar does not really give a fuck about that.

Yes, if we replaced indentation rules with braces, or begin/end or what not, the grammar would be the same, STILL CONTEXT-FREE.

Just end this silly discussion and read some more about the topic.

dnia niedziela, 04 październik 2009, 152821 by Dodek

At least it's now clear to me that under "language grammar" you understand "grammar used in language's parser". Your screams, swear words and personal attacks won't change the fact that you're wrong in that matter. Language grammar is the set of rules that are applied to the input of the compiler/interpreter which in case of both Python and C++ is a stream of characters. Whether the whole process is divided into two parts: a lexer which transforms the input into another language (which terminal symbols are tokens with optional values associated to them) and a parser which accepts that other language or if it's handled by a single entity is irrelevant -- the whole transformation is what is language's grammar.

I mean I can't even understand why you so calmly accept the claim that Python and Python+braces languages have the same grammar. Do you really feel like the following fragments:

if foo:
..bar
..baz

and

if foo: {
bar
baz
}

have the same grammar? It's really beyond me.

dnia niedziela, 04 październik 2009, 155248 by mina86

"Language grammar is the set of rules that are applied to the input of the compiler/interpreter which in case of both Python and C++ is a stream of characters."

No, it's not, and we can resume this discussions as soon as you understand it.

dnia niedziela, 04 październik 2009, 155709 by Dodek

Hmm...

OK, then.

Yes, it is, and we can resume this discussions as soon as you understand it.

dnia niedziela, 04 październik 2009, 160303 by mina86

What a load of bollocks. Context-sensitivity is what makes parsing Perl turing-complete. See http://www.perlmonks.org/?node_id=663393

And it's C++'s suck, not static-typing's suck. This is valid Haskell code:

f x = x

In C++ it would probably look like:

<integer>identity<integer>(x) { return x }

or something like this.

Real static typing languages can actually promise any sort of correctness that comes with static-typing overhead -- like, guaranteeing at compile time that a binary tree is properly balanced. C++, Java and related garbage can't even guarantee that, because in these languages static typing is nothing more than overhead. I get so sick and tired of arguing with C++/Java apologists.

dnia niedziela, 04 październik 2009, 180736 by frob

Oh and mina86, do you even know what Chomsky hierarchy is?

dnia niedziela, 04 październik 2009, 180900 by frob

Okey... I admit, I have no idea what does static typing have to do with binary trees.

Oh and frob, do you even know what "turing-complete" means?

dnia niedziela, 04 październik 2009, 181705 by mina86

What I meant is that to parse Perl one must solve the halting problem.

dnia niedziela, 04 październik 2009, 181911 by frob

frob, well of course he knows, for he knows how to decide which "<" is "less then" and which is template opening brace prior to parsing the code - how could he not know so simple thing?

dnia niedziela, 04 październik 2009, 182137 by Dodek

Ok, I couldn't comment earlier.

@Dodek: please calm down, and stop swearing :>

@mina86: It seems you haven't ever tried to write parser for a language.
C++ is context-sensitive, Python is not, please DO read more about grammars BEFORE further arguing, because you simply make fun out of yourself.

Here's good link for you:
http://wazniak.mimuw.edu.pl/index.php?title=J%C4%99zyki%2C_automaty_i_obliczenia


Please to all of you, end this OT discussion.

Back to the main topic:
"If we can point a function why can't we point a method? That would be inconsistent and broken."

That's obvious, but pointer-to-methods has been ill-designed. Pointer to method should be basically
pointer-to-object + pointer-to-entry-in-vtable and that
would solve LOT of problems.

I truly don't understand why (due to sick design) I can't simply do:
foo = &(barObject->bar);
foo = &(bazObject->baz);
(where Bar and Baz classes are totally unrelated).
(This would be possible without a problem, if they were implemented like a pair of pointers, as written above).

btw: you've written that you use pointers to function, when doing state machines. Even cooler things can be done while putting pointer-to-members in a <list> or even <map>.

"I can't think of any place where they are used in STL"
Take a look at mem_fun* adaptors.

dnia poniedziałek, 05 październik 2009, 155742 by GiM

You are mistaken, I have written several parsers (most using bison, some by hand using recursion) and I'm well aware what a grammar is.

What we cannot seem to agree on is how "language grammar" should be defined. You claim it's the grammar used in a parser I claim is grammar which you apply to a source files, ie. a stream of characters.

My rationale is that (i) you can use different parsers (with different grammar) for the very same language (say you have a single C lexer which treats literals like "123u" as a single token UNSIGNED_INTEGER and another which produces a pair of INTEGER and INTEGER_MODIFIER_UNSIGNED tokens) and (ii) you can use the very same parser to parse two different languages (where example would be Python and Python+braces).

Anyway, back to the topic.

First of all, in your concept, I don't see why vtable need to be considered -- a simple pointer-to-object + pointer-to-method would suffice since even if the method is virtual pointer to it could be resolved when assigning value to the pointer. This however is not relevant.

What is relevant is that in my opinion is not a matter of wrong design -- you simply describe another type which would be used for different things then pointers-to-methods present in C++.

I do agree however that such a type would be handy in some situations.

As of mem_fun you're right -- I forgot that funny stuff is going with templates used for creating functors.

dnia poniedziałek, 05 październik 2009, 210740 by mina86

"What we cannot seem to agree on is how "language grammar" should be defined. You claim it's the grammar used in a parser I claim is grammar which you apply to a source files, ie. a stream of characters."

Chomsky hierarchy is well established in computer science, so a definition of "grammar" does not pose a problem, which, in fact, is you, who try to introduce new definition that all of use refuse to accept.

"My rationale is (...)"

...not relevant, as we have solid and well defined grammar definition for decades now.

dnia poniedziałek, 05 październik 2009, 212604 by Dodek

Not at all! I fully agree with the definition of grammar as a (N, E, P, S) tuple. Moreover, I do not argue whether grammar used in Python's parser is context-free or not -- in fact I am willing to agree that it is!

What I do not agree is a claim that grammar of a language and grammar of a parser used in one particular compiler/interpreter of the language is the vary same thing!

You say Python's parser uses context-free grammar ergo Python is context-free language. I say Python's lexer is context-sensitive ergo Python is context-sensitive language.

dnia poniedziałek, 05 październik 2009, 213625 by mina86

Python can be implemented using a context-free parser, therefore its grammar is context-free. C++ cannot.

dnia poniedziałek, 05 październik 2009, 214307 by frob

"I say Python's lexer is context-sensitive ergo Python is context-sensitive language."

What is "context-sensitive language"? What is "context-sensitive lexer"? You introduce new terms without definitions.

I'm getting tired of this. Please, provide me with a citation supporting your claims, otherwise it is just your personal wrong, uneducated opinion.

dnia poniedziałek, 05 październik 2009, 214355 by Dodek

Oh wait, I misunderstood your last comment. But dunno if context-sensitivity applies to lexing.

dnia poniedziałek, 05 październik 2009, 214508 by frob

@mina, no, it's not indentation errors are nor recognized by lexer, but by a token-parser.

Please stop arguing about this. This is OT!

dnia poniedziałek, 05 październik 2009, 214612 by GiM

When implementing a whitespace-significant grammar like Python's, I preserved whitespace as tokens instead of stripping them. Then after lexing but before parsing, I changed individual whitespace and newline characters to '1+-whitespace' and '1--whitespace' symbols. Perhaps it could be done more elegantly.

Dodek, don't be so elitist. Next you'll be just bashing everyone without a PhD :)

dnia poniedziałek, 05 październik 2009, 214840 by frob

frob, considering I do not have one, it's highly unlikely ;)

dnia poniedziałek, 05 październik 2009, 215535 by Dodek

One more point to end this discussion,
As PiotrLegnica already written, python can be parsed by LL(1) parser, this itself is a proof that it's context-free.

As for C++, here is simple example that show it's context-sensitive:

a)
class C {};
C a(); // object of type C

b)
typedef int C;
C a(); // forward declaration of a function
int main() { reutrn 0; }
C a() { return 66; }

Please do not argue anymore about grammar since that doesn't have to do anything with this post, and I'll turn into bad facist and will start to delete comments :P

dnia poniedziałek, 05 październik 2009, 221112 by GiM

As a matter of fact you are mistaken. In both contexts "C a();" is declaration of a function which can be easily tasted by the following code (which does not compile):

struct C { int a; };
C a();
int main(void) { return a.a; }

dnia poniedziałek, 05 październik 2009, 221617 by mina86

this itself is a bit strange (especially when you declare C a() inside the function) but how about this one:

ex.a)
struct B {};
B b;
int main() { b = B(); }

ex.b)
int B() { return 6543; }
int b;
int main() { b = B(); }

dnia poniedziałek, 05 październik 2009, 224300 by GiM

Still, it's not an example of context-sensitiveness of C++'s syntax. Parsing main's body produces identical parse tree in both examples.

Templates is what makes C++'s syntax context-sensitive in particular the use of angle-braces, ie.:

foo < bar , baz > qux;

can be either two expressions "foo < bar" and "baz > qux" joined by a "," operator or a definition of "qux" variable of the type "foo<bar, baz>".

dnia poniedziałek, 05 październik 2009, 224918 by mina86

no it doesn't, once you have call to ctor, and once to function, how's that "the same parse tree"?

and here is even better in C:
#if 0
int x, y;
#else
typedef int x;
#endif
int main() { x* y; return 0; };

and yes, templates provide even more ways for amiguitions.

dnia poniedziałek, 05 październik 2009, 230106 by GiM

Whether it's a call to function or constructor is analysed later, not by parser. Writing it in lisp-like format parse tree for main's body in both cases is: (statement (assignment (identifier "b") (function-call (identifier "B") (function-call-arguments))))

As of your example with preprocessor I think it's actually correct which shows C's syntax is context-sensitive. Of course preprocessor has nothing to do with it but what's important is that "x * y;" can produce either of two parse trees: (statement (multiplication (id "x") (id "y"))) or (definition (type (id "x")) (pointer (variables (id "y")))) which is rather important since depending on meaning the following code is either valid C89 or not:

int main() { x * y; int a = 0; return a; }


But let me get back to the Python vs. C++ discussion for a moment to add one final comment and then I shall finish the topic letting readers decide who they think is right.

First of all I'm not aware of any formal definition of a lexer (since lexer is not really something that theory defines or cares about. It's merely a helper for constructing compiler/parsers and as such a tool of engineers rather then mathematicians. If one would really want she could create a compiler without the need for lexer -- parser's grammar would become a bit larger but other then that it is possible) as such but we can easily came up with a correct formal definition by analysing what lexer is doing.

Lexer takes a stream of characters on it's input and produces stream of tokens on it's output. That's exactly what something I know under a name of "translation schema" (pol. schemat translacji) or "translator" which is a (N, T, P, S, Act) tuple where N, T and S have their typical meaning, Act is a set of actions and P is a set of productions such that their right side is an element of (T u N u Act)^* (ie. like the usual productions expect there can be actions on the right hand as well).

As an example I've posted a grammar of a simple expressions along with a translator which translates simple expressions into RPN form at <http://pastebin.org/36543>.

Lets elaborate on the grammar I've posted. Among characters there are two non-character terminal symbols: ID and NUM. Let say we'd like to write a parser which uses presented grammar. Clearly, it cannot take a stream of characters as input because set of terminals used in the grammar disagrees. What we need is a lexer. For instance one that I've posted at <http://pastebin.org/36559> (for the sake of simplicity I've skipped calculating value of ID and NUM tokens).

From those examples it should be obvious that truly lexer is a translator which translates one language into another.

Now, if we remove actions from production used in translator we get simple productions present in "normal" grammars therefore we can apply all the classifications of a formal grammars to translators and so define "regular translators", "context-free translators", "context-sensitive translators", etc.

Since lexer is simply a translator we can, for the sake of our discussion define, terms such as "regular lexer", "context-free lexer" or "context-sensitive lexer".

Usually, compilers and interpreters use "regular lexers" -- lex or flex lexer generators use regular expressions as their source.

Python on the other hand uses a "context-sensitive lexer" since it need to investigate size of indent of lines preceding the one it actually analyses.

As such, I strongly believe Python is a context-sensitive language even if the sensitiveness is so trivial that it can be easily handled by lexer.

dnia poniedziałek, 05 październik 2009, 232502 by mina86

NO IT DOES NOT.

Even assuming what you've written, lexing of a python can be done using FSA with a stack, which makes it's lexer
context-free.

EOT.

dnia poniedziałek, 05 październik 2009, 234147 by GiM

@Dodek: Please remember that grammars operate on symbols over some alphabet. Usually, in computer science, they equal to the input streams, ergo characters. You cannot avoid them while analysing every particular language. Yes, you are right, Python is context-free language over alphabet with additional terminals (indent-start, indent-end). mina86's deduction also makes sense - but over alphabet limited to common byte characters. Please, both of you, specify the alphabet you are talking about first. After that you may analyse and prove your grammars.

@GiM: Are you sure you can count spaces using FSA? Rather I(nfinite)SA...

dnia wtorek, 06 październik 2009, 230932 by Zboczuch

@Zboczuch: It passed some time since I had to deal with grammars, but iirc *non-deterministic* FSA is enough to prove, that language is context-free, and correct me if I wrong, but I think you can count spaces using NFSA.

(Sorry for the delay in response)

dnia czwartek, 08 październik 2009, 215910 by GiM

With finite number of state (no matter if automa is deterministic or not since those two are equivalent) you can count spaces only by placing them on stack. But if you do how do you compare number of spaces in two lines without destroying information present on the stack already?

dnia czwartek, 08 październik 2009, 222014 by mina86

@mina86: Of course it is possible (see classical grammar for {a**n b**n} language).

@GiM: Just few seconds ago I noticed that you mentioned about FSA *with stack*. Naturally, this one is able to count spaces.

@All: Regarding the problem whether Python or C++ are context-free languages or not, I always thought that the pattern (identifier declaration --> further its usage) makes grammars for said languages context-sensitive.

I mean, to prove that following Python program does not belong to the language (formally)...

#v+
def foo():
pass

bar()
#v-

... information about context is obviously necessary.

dnia czwartek, 08 październik 2009, 234626 by Zboczuch

"(...) the pattern (identifier declaration --> further its usage) makes grammars for said languages context-sensitive."

This statement is false.

dnia czwartek, 08 październik 2009, 234930 by Dodek

You can count spaces using FSA with stack but you cannot compare it. You clearly need two stacks or a single stack with two pointers.

http://groups.google.com/group/comp.theory/browse_thread/thread/9c9ccec919c03f79/d2143c8ec6d98900?ie=utf-8&oe=utf-8

dnia czwartek, 08 październik 2009, 235104 by mina86

@mina86: I think it's quite natural, to assume, that indentention is limited (I mean indentation level).

Lexer only needs to detect inconsistent dedents, bad indents are handled by parser.

Assuming that (I mean limited indentation level) you can build PDA that handles that (give it a try).

dnia piątek, 09 październik 2009, 224938 by GiM

If you assume any limits then the whole discussion is pointless.

dnia sobota, 10 październik 2009, 101813 by mina86

@mina: probably, discussion is one big OT, nobody really cares if lexer is "context-sensitive" or not.

BTW: you can give writing python's lexer as an assesment in high-school, giving C/C++ lexer (even without lexing the preprocesor), probably wouldn't be so good idea, especialy with it's trigraphs and freakin' comments.

unsigned int b = ??<'??/n'??'(--a?a:??-'??/r')??>;

dnia sobota, 10 październik 2009, 143807 by GiM

Dodek started it not me. He claimed C++ is "the only 'popular' language with context-sensitive syntax" which I cared to disagree. So as far as I am concerned, I don't care whether C, C++ or Python's syntax is or is not context-sensitive.

dnia sobota, 10 październik 2009, 153636 by mina86

What I claimed was well accepted fact, you were the one trying to refute it.

dnia sobota, 10 październik 2009, 154054 by Dodek

I promised I won't get back to this pointless discussion and I'm not going to.

dnia sobota, 10 październik 2009, 154217 by mina86

Please don't tell I started it then.

dnia sobota, 10 październik 2009, 154313 by Dodek

Someone who "knows" that pointers to members... comments about C++ syntax?? LOL.

Read boost::bind, boost::function. Pointers to members is such a fundamental technique in any decently complex C++ programming project... think threads.

dnia sobota, 12 grudzień 2009, 134544 by csabill

..tożsamość..:
..meritum..:
..lokum..:
Wpisz kod:code