Friday, October 28, 2005

Recordings of 24h of DeXter

The recordings of the 24 hours of Dexter sessions are now available in streaming and downloadable mp3 formats. There is a ton of interesting high-quality technical listen material here.

Not totally unbiased, I must recommend my own session where I talk about two samples that will most likely be included on the Delphi 2006 CD (a Delphi7 compatible memory manager, and a TComplex record with operator overloading for Win32), Jon Shemitz' upcoming classic .NET for Delphi programmers book (where I am the technical editor and may end up contributing an Appendix on the Delphi language), and our great products at Infront AS, including The Online Trader™, X-Quote™ and WebBroker On Steroids based WebQuote™ engine.

Other recommended sessions (I haven't heard them all yet) are:

Danny Thorpe - "Danny details the new Delphi compiler enhancements in DeXter, such as records with methods and operator overloading in Win32, and improved function inlining."

Allen Bauer - "Allen discusses the new features of the DeXter IDE."

Pierre Le Riche & John O'Harrow - "Pierre talks about the new memory manager in DeXter. He discusses backward compatibility, the new features, as well as the differences in behavior between the old and new memory managers. He will also give a technical insight into the inner workings of the new memory manager. John discusses the Fastcode project: What it is, who it is, the history behind it, and the contribution that it has made (and continues to make) to Delphi and the Delphi developer community. He also mentions some of the improved runtime library functions that were developed by the Fastcode project and are now incorporated into DeXter."

Chris Hesik & Alastair Fyfe - "Chris and Alastair cover all that is new, improved and cool about the Integrated Debugger for Dexter, including features for both native and managed debugging. They also discuss what C++Builder 6 users can expect from the debugger when moving to Dexter."

But by all means, do check out the other sessions - I'm sure you'll find your own favourites. 
     
During the interview and live chat I got some questions about where my "stuff and tools" can be downloaded. I might blog more about some of these items later, but you can browse and download several of my tools from Borland's excellent CodeCentral repository.

I'd also recommend reading my The Delphi Magazine articles, including the one in issue 82 about the WOS (WebBroker On Steroids) framework. If you haven't already, do yourself a favour and order the CD with all the back issues - a real gold mine of high-quality information. Most of my articles come with source that can be downloaded for free here.

Let me know if there is any of this stuff you'd like me to blog about.

Monday, October 24, 2005

I'll be on the 24 Hours of DeXter live audio chat!

Borland's amazing live audio chat, 24 Hours of DeXter, is on as I write this (10:47 Oslo time). You can plug-in and listen to the audio, join the text chat to ask questions (roomkey is "livechat"), and even win prizes if you register your presence! Be sure to tune in around 16:00 Oslo time (15 GMT, 07 AM CA time) when I will join the audio chat:

"Hallvard will talk about his Win32 memory manager clone, and his Win32 record operator overloading example for DeXter. He will also talk about his company's new product, the leading Nordic financial real-time information and trading product, The Online Trader, and how they developed it using native Delphi."

Be sure to /ask some cool questions from the chat window!

Friday, August 26, 2005

Danny Thorpe on Unicode and VCL

As always, debates about Delphi and its future rages on in the borland.public.delphi.non-technical newsgroups. Often feelings race high and speculations, FUD, trolling and flamewars is the order of the day - you have been warned ;).

One way of getting at some nibbles of good, technical and useful information from the newsgroups is to read them "vertically" - I find myself often arranging the newsgroup posts by poster name, and reading most or all posts of people I know from experience are good posters.

One never disappointing source of such enlightening posts is Borland's Chief Scientist Danny Thorpe. Recently he posted some interesting points about how he views the challenge of a Unicode enabled native VCL (VCL for .NET already supports Unicode, of course). You can click the link to see Danny's full posts in context from the Google cache.

Here is my interpretation of what I understand are the main points:

  • Delphi .dfm files are already Unicode ready - strings are stored as utf-8
  • Delphi already has the types required to support old and new code (Char/AnsiChar/WideChar
    and String/AnsiString/WideString)
  • Keeping both Ansi VCL and Unicode VCL means; new 3rd party controls, numerous porting issues (char size etc.), duplicate IDE designers, etc., etc.
  • For performance and memory usage reasons, WideString should be made reference counted (using OleStr for external calls).
  • It makes sense to keep Win32 VCL Ansi, while targeting UniCode VCL for a Win64 Delphi platform

Here are some quotes from Danny's posts (published here with permission, of course) - emphasis is mine:

"Danny Thorpe" wrote:
> Does anyone know if Borland if ever plans to add Unicode support in VCL.

The main question is: How much compatibility are you willing to sacrifice to get a Unicode VCL? Unicode VCL for Win32 will not be fully compatible with many third party components out there. Unicode VCL for Win32 will require new component designers in the IDE that will not be compatible with Ansi VCL. Unicode VCL for Win32 will require new design interfaces in the IDE which will not be compatible with the existing design-time interfaces.

 [...]

Yes, we intend to produce a Unicode VCL. We already have in VCL.NET, and the only sane choice for 64 bit VCL is all-Unicode. The cost of adding Unicode support is less when you are starting with a new platform base which already has a compatibility barrier.

-Danny

Another quote:

"Danny Thorpe" wrote:
> The only thing you would have to do is update any literals stored in the DFM.

String literals in DFM files are already stored as UTF-8, a compressed Unicode encoding. UTF-8 looks like ANSI/ASCII for chars < 128. No DFM update utility is required.

> The break would be so minor, it shouldn't take more then a week to convert several hundred units.

The breaks I'm referring to run far deeper than DFMs. How much code do you have that runs through a PChar array by incrementing a pointer by one? In a Unicode world, PChar = PWideChar, which means each char is 2 bytes.

Similarly, any code that scans a string assuming that the first zero byte is the null terminator will fail with Unicode strings, because most Unicode chars (for English) have a zero high byte.

For most Win32 APIs involving string data, there are matching Ansi and Unicode definitions. But not all. Which of the Win32 APIs that you rely on today are not symmetric?

How much code do you have that is aware of multibyte character encodings for Middle Eastern or Far Eastern languages? In a Unicode world, most MBCS gymnastics are completely unnecessary and most are benign, but a few MBCS code patterns actually fail on Unicode. See the byte assumption above.

WideStrings are currently implemented by Delphi as OLEStr, aka BStrs, allocated using the SysAllocString Win32 API. These are not reference counted, and are rather promiscuous in copying themselves for every reference. Clearly, the Delphi WideString implementation needs to be changed to a reference counted WideString to save memory and performance if WideString is to become the primary string data type. But that means Delphi's WideString will have different allocation semantics from OleStr. Reference counted WideStrings will have to be converted to single-reference copies before being passed out of the application to Win32 APIs expecting PWideChar buffers.

Breaking the WideString = OleStr type alias means that all the Win32 APIs that are now declared as taking WideString will need to be changed to OleStr. We'll handle Windows.pas and the other Win32 API units we provide, but you will have to do the equivalent work on any other DLL API declarations your applications use. Until you find them all and fix them, your app will compile fine but will crash mysteriously at runtime. The compiler can't help you here because the compiler can't tell if the DLL you're calling actually expects OleStr or if it's a Delphi DLL that's actually expecting a Delphi reference counted WideString. The compiler has to rely on you to get the declarations right.

If your code and the components you use have been ported to Linux or .NET in the past, then chances are these kinds of things have already been found and modified to be char size agnostic.

Unicode VCL sounds like such a simple, little thread... until you start pulling on it.

-Danny

Final quote:

"Danny Thorpe" wrote:
> many applications (particularly those connecting to external systems) can never be 100% unicode. They will always have a mix of unicode and non-unicode sections.

True, there is always a need to be able to specify which parts are wide and which parts are not. That's why we have 3 char types (AnsiChar, Char, WideChar) and 3 string types (AnsiString, String, and WideString). All those types continue to exist in Unicode places such as Delphi.NET and Kylix, but the definition of the middle one changes.

The issue is not that there is missing capability in the types. The issue in any port or redefinition of core semantics is that people very rarely write code that is multi-platform ready unless they are actually testing and debugging across multiple platforms. If you write your code to always use the never-changing types whenever you incorporate assumptions about char size, and always use size-flexible types when you should, then you'll have fewer porting issues. The issue is, people don't code that way unless they are being forced to.

> TNT unicode is probably the most used unicode solution

TNT is a good compromise, but it does not present a complete solution that includes design time support and architectural simplicity/uniformity.

> XChar/XString (8 or 16, depending on project options).

You already have types like that, and you have had them for 10 years. They are: AnsiChar/AnsiString, WideChar/WideString, and Char/String.

[…]

There's no need for an additional type. Other programming languages that span Ansi and Unicode have the same issue, and the same points of failure - code that was not written with both camps in mind.

The only languages that do not have this issue are those that don't support both camps. Java, for example, has always been taxed with memory consumption issues associated with having only Unicode strings. The .NET platform is fully Unicode (include Delphi.NET), so the only issue is code that was written prior to Unicode availability, and more recently code that was written in a Unicode context which fails to handle the more complicated world of Ansi and multibyte encoded character sets.

Correllaries to Murphy's Law: If something is adjustable, someone will adjust it incorrectly. If something has an option, someone will write code that does not handle that option correctly.

This is why I fight strongly against "just make it an option or a switch" solutions. The ideal is to have a single solution, so that there is no room to get it wrong. That's why I believe Unicode VCL is a better fit for something like a Win64 Delphi, because Unicode VCL would be the one and only 64 bit VCL. No flippin switchiness to add complexity to get between the programmer and his/her objective.

-Danny

Thanks, Danny! Keep them posts coming!

Friday, August 12, 2005

The ultimate Delphi IDE start-up hack

He's done it again! Petr Vones has hacked together and published an at-your-own-risk patching tool that will trim the start-up time of your Delphi IDE. I've tested it with my Delphi 7, 8 and 2005 IDEs and in each case, the startup time of the IDE is noticeably faster after applying Petr's clever hack.

Before you use the patcher, read Petr’s readme file where he notes:

Warning: removing the dup unit check code may cause unexpected errors, USE AT YOUR OWN RISK !!!

Reading the rest of this article, doesn’t hurt either. ;-)

Background information
At start-up the Delphi IDE (like any other application that uses the magic of Delphi's run-time packages) will load statically and dynamically linked packages (plain Windows .DLL files using a .BPL extension and containing Borland's magic plumbing to make it all work). Each package can contain any number of units and require any number of external packages. In this context, the main thing is that Bad Things™ (such as random AVs, .DFM streaming errors etc) can happen if you were to load to packages containing the same (or same-named) unit in two or more packages.

To prevent this from happening, the Delphi RTL contains a SysUtils.LoadPackage routine that is responsible for dynamically loading a new package. This is used by the IDE to load all packages (containing the standard VCL components, third party components, design time editors, wizards, exports etc.) One of the tasks LoadPackage performs is to check that the currently loaded package does not contain a unit name that exists in one of the already loaded packages. If it detects a unit-name collision, a EPackageError exception with the message

Cannot load package 'PackageA' It contains unit 'UnitName' which is also contained in package 'PackageB'

is raised and the newly loaded package is unloaded again.

Analysing the algorithm
If you have the Delphi RTL source code available (all serious Delphi programmers should) you have a look at this logic by searching for "CheckForDuplicateUnits". Stop reading now and take a look at that code.

Ok, back already? Pretty complicated-looking code, don't you think? I've taken a quick shot at trying to analyse the complexity of checking the uniqueness of all unit names contained in all loaded packages (i.e. how it theoretically performs when number of loaded packages and units increase). There are five nested loops:

  1. The outer loop (in the IDE) loading all the installed packages (read from the Registry)
  2. One recursive loop (InternalUnitCheck calling itself) to handle all the requires-links between the current package and all the sub-packages
  3. One iterative loop (in InternalUnitCheck) of all the contained units in the current package
  4. One iterative loop (in IsUnitPresent) of all currently loaded modules
  5. One iterative loop (nested in IsUnitPresent) of all the contained units in each loaded module
So the algorithm looks like it has the complexity of O(P * R * M * U * U) where
  • P is the number of installed packages that will be loaded by the IDE
  • R is the average number of required packages
  • M is the average number of loaded modules (starts low and increases as more and more modules have been loaded)
  • U is the average number of units contained in a package

M varies from 0 to ~P, and on average it will be about P / 2. Let's be nice and set the average number of required packages to just 2. The expression then simplifies to O( P * 2 * (P/2) * U^2) or O(P^2 * U^2).

U^2 will be some constant (but potentially large) number, so the major complexity is O(N^2).

What this all this jumble-bumble means in practical terms is that the more packages you have to load, the slower it gets - no surprise there. But it also means that the running time rises exponentially as the number of packages and units increase. So theoretically, if loading 100 packages takes 10 seconds, then loading 200 packages should take in the neighbourhood of 40 seconds (instead of the 20 seconds that would be expected with a linear O(N) algorithm).

Is there a problem?
For most people the start-up time of the IDE should not be a major issue. The safety that the unit-name checks gives you is probably worth the extra time it takes. If you are often changing the installed packages, testing freeware and commercial packages, installing your own development packages etc. the convenience of having the IDE explicitly inform you of unit name conflicts outweighs any start-up time improvements.

However, for people that have a fixed group of a large number of packages, the start-up time can get noticeably long. If you are willing to live a little dangerously and risk shooting yourself in the foot (getting random crashes or weird issues caused by duplicate unit problems), you may consider using Petr's hack.

What is the solution?
If we conclude that given a trend of increasing number of packages typically installed into the Delphi IDE, the start-up time starts getting uncomfortably long, what can we do about it?

Well, for the fool-hearted (brave?) among us, there is Petr's patching hack, of course. This is a brute-force solution that simply turns off all unit-name checks "simply" by patching the first instruction of the CheckForDuplicateUnits routine to be a RET instruction, effectively turning it into a NOP operation.

But if Borland decided to do something about this, what could they do? Considering the high complexity and O(N^2) behaviour of the existing CheckForDuplicateUnits implementation, the most obvious thing to do would be to change the algorithm into something a little more efficient, like O(NlogN) or even O(N). How can this be done?

The goal is to detect collisions between a large set of strings. A hashtable using a string key would be perfectly suited for this. One complication is that the current algorithm uses the linked structure of all currently loaded packages to iterate over them all.

If *all* loaded packages are loaded dynamically through calling LoadLibrary this should not be a problem. However, if one or more packages are loaded statically (an application or package require a set of other packages at load time), the OS will load this packages and the LoadLibrary routine (and its unit-name checking logic) will never "see" these packages, and thus will fail to detect collisions with the unit names contained in them.

In addition a hashing list solution will have to take into consideration unloading of packages, removing unit names from the hashlist. The current algorithm doesn't have to take this into account as it uses the automatically linked up/torn down module chain.

Instead of maintaining a global hash-list that is kept between invocations of CheckForDuplicateUnits, it could create an empty one from scratch on each call and just use it as a relatively quick way of finding unit-name collisions. It would do this by first iterating over the already loaded modules, adding their unit names to the hash list. Then it can try to add the currently loading package unit names to the hash list – if a collision is detected here raise an error and unload it.

I’ve not analysed the possible implementations in details but (assuming it would be possible to implement correctly) I’d guess that a global hash list implementation would give O(N) performance, while a local hash list implementation (that is re-populated on each call) would give O(NlogN).

A completely different solution (and one that could be implemented even if the basic algorithm is improved), is to have the IDE detect when a new graph of packages/units that have not been checked before is being loaded. Turn off the checks if there are no new or changed packages since the last check.

Conclusions
Hardware is getting faster, but the growing size and complexity of software may be eating up all the benefits. Specifically for the Delphi IDE with a large number of packages installed, start-up times can become sluggish.

I think that Petr's patching tool will end up being used by die-hard hackers like myself until a cleaner and safer solution is present. That is not necessarily a bad thing - it gives us the nice feeling of living on the edge and getting a faster IDE start-up time than everybody else ;).

Wednesday, January 19, 2005

Interface-to-interface casts

When you already have an interface reference and want to (try and) convert it into another interface reference, you have to use some kind of cast or conversion function. Just as with object-to-interface casts, there are a number of options.

The possibilities are hard-casts, as-casts, is-checks, the Supports function and the QueryInterface method. As we shall see not all of these are available in both Win32 and .NET - and some of them have different behaviour.

Interface as-casts
In Win32, if you try to as-cast to an interface that is not implemented, an exception is raised. In Delphi 8 for .NET, nil is returned instead. The behaviour should be the same on both platforms. This bug has been fixed in Delphi 2005.

Interface hard-casts
The compiler allows hard-cast syntax from an interface reference to another interface in both Win32 and .NET. While the .NET cast is safe and performs a logical conversion (returning nil if it fails), the Win32 cast is completely unsafe and does normally not do what you want. You are basically telling the compiler that the interface reference on the right side already contains a reference of the cast-to interface type (and not the declared type). In general, this type of cast is not currently useful in Win32.

Interface is-checks
For some reason, in Win32, is-checks are not supported on interfaces. To check that an interface reference implements another interface, you have to use Supports or call QueryInterface directly. .NET however, does support is-checks on interface references.

Supports function
Both the Win32 and .NET platforms support (sic) using Supports to check that an interface reference implements another interface - and to return this new interface reference. On .NET the Supports function has the same run-time issues as when casting from Object to Interface (in fact, it is exactly the same Supports function overload that is used - any interface reference is compatible with TObject); it is relatively slow.

QueryInterface
Specific to Win32, all interfaces have a QueryInterface method inherited from the base interface IInterface (or IUnknown). Under the hood, the Supports function (and even the is and as operators) calls QueryInterface. There is nothing stopping you from calling QueryInterface directly, of course, but then you have tied yourself to Win32 and the code needs to be changed to work in .NET.

Code sample
Let's write a small sample that demonstrates all the different ways to cast and check from one interface reference to another.

program TestIntf2Inf;
{$APPTYPE CONSOLE}

uses
SysUtils;

type
IMyInterface = interface
['{99D91C44-BCE7-4D35-A661-DE32E8AE56FC}']
procedure Foo;
end;
IMyInterface2 = interface
['{2E200094-0643-46C8-87AF-AAB0A1F5801D}']
procedure Bar;
end;
INotImplemented= interface
['{BAEE6F63-FF47-4877-9657-443B6D1555FA}']
procedure Zoo;
end;
TMyObject = class(TInterfacedObject,
IMyInterface, IMyInterface2)
procedure Foo;
procedure Bar;
end;

procedure TMyObject.Foo;
begin
Writeln(ClassName, '.Foo!');
end;

procedure TMyObject.Bar;
begin
Writeln(ClassName, '.Bar');
end;

procedure Foo(const I: IMyInterface);
var
MyInterface2: IMyInterface2;
NotImplemented: INotImplemented;
begin
// Win32 suppports as and Supports.
// Hard-cast is unsafe, is does not compile
// .NET suppports as, Supports, Hard-cast and is
MyInterface2 := I as IMyInterface2;
MyInterface2.Bar;
// hard-cast .NET: safe, returns nil on failure
// hard-cast Win32: Unsafe, compiles but may crash/fail
MyInterface2 := IMyInterface2(I);
// .Win32: Calls TMyObject.Foo!
// .NET: Calls TMyObject.Bar
MyInterface2.Bar;
try
NotImplemented := I as INotImplemented;
if not Assigned(NotImplemented) then
writeln('D8 .NET bug: as returns nil'+
' - should raise exception');
NotImplemented.Zoo;
except
{$IFDEF Win32}
on E: EIntfCastError do
{$ENDIF}
{$IFDEF CLR}
on E: InvalidCastException do
{$ENDIF}
writeln('As designed: ', E.ClassName, E.Message);
on E: Exception do
writeln('Bug: ', E.ClassName, E.Message);
end;
// Supports works in both Win32 and
// .NET, but is relativly slow in .NET
if Supports(I, IMyInterface2, MyInterface2) then
MyInterface2.Bar;
{$IFDEF CLR}
// intf is intf is not supported in Win32
if I is IMyInterface2 then
writeln('interface is interface suppported in .NET');
{$ENDIF}
{$IFDEF Win32}
if I.QueryInterface(IMyInterface2, MyInterface2) = 0 then
MyInterface2.Bar;
Writeln('QueryInterface supported in Win32');
{$ENDIF}
end;

var
I : IMyInterface;
begin
I := TMyObject.Create;
try
Foo(I);
except
on E: Exception do
writeln(E.Message);
end;
readln;
end.


This code should compile and run from D7, D8.NET, D2005 Win32 and D2005 .NET. The following is the output in each case.


Output Delphi 8 .NET:
TMyObject.Bar
TMyObject.Bar
D8 .NET bug: as returns nil - should raise exception
Bug: NullReferenceExceptionObject reference not set to an instance of an object.
TMyObject.Bar
interface is interface suppported in .NET


Output Delphi 2005 .NET:
TMyObject.Bar
TMyObject.Bar
As designed: InvalidCastExceptionSpecified cast is not valid.
TMyObject.Bar
interface is interface suppported in .NET


Output Delphi 2005 Win32 and D7:
TMyObject.Bar
TMyObject.Foo!
As designed: EIntfCastErrorInterface not supported
TMyObject.Bar
TMyObject.Bar
QueryInterface supported in Win32


Specifically notice that the D8 as-cast bug has been fixed in D2005 and that Win32 hard-casts have strange effects, calling the wrong method! IMO, this is not a bug, but a consequence of Win32 hard-cast semantics.


When hard-casting on Win32 you are essentially telling the compiler; "Forget about the declared type of this variable and treat it like it was this type instead". You have to know what you're doing. I really cannot see any case were the current Win32 interface-to-interface cast is useful, so maybe Borland could beef up the dcc32 compiler to make it work like it does in dccil in a future version…? Determined hackers would still be able to do a raw binary cast by casting to Pointer first.


Hard-casting on .NET is still safe and normally performs a logical conversion, not a binary re-interpretation.

FastCoder's 2004 New Year's Speech

The fine FastCoder hackers residing over in the borland.public.delphi.language.basm newsgroup have summarised their activity in the past year. The highlights are:

  • John O'Harrow is the contributor of the most and best FastCode challenge entries (winning a Delphi 2005 Pro)
  • Three FastCode routines were included in the updated Win32 RTL in Delphi 2005 (CompareText, Int64Div and FillChar)
  • 16 challenges were completed
  • The best AnsiStringReplace routine is now some 17 times (!!!) faster than the Delphi RTL StringReplace function
  • A new major challenge is writing an improved memory manager (faster, less fragmentation, better multi-CPU performance, using less memory)
  • A 64-bit version of Delphi is desired

Read the whole speech here. Keep up the good work, FastCoders!



Copyright © 2004-2007 by Hallvard Vassbotn