Michael Zuskin

Elegant Code - Code Writing Style



Over-indenting is the enemy number one


Indent code fragments with as few tabs as possible.


Don't be carried away by extra indenting! In most cases, an indentation of 3 tabs says that the code could be written more elegantly, and an indentation of more than 3 tabs is a "red light" and signals about totally incorrect organization of code structure. Tab is like salt in a soup: it's a must, but over-use will make the product worthless. If you find in your code long "staircases" of closing braces (}) or END IFs then it's a good idea to take your habits critically! In my own development, I almost never exceed 2 tabs - obeying this very easy principle keeps my subprograms simple!

Loops with CONTINUE


Use continue in loops instead of indenting the subsequent code with one more tab.


This method is a heavy weapon in the war against over-indenting:


*** BAD code: ***


PB
do while [condition]
   if this.uf_data_ok_1() then
      if this.uf_data_ok_2() then
         if this.uf_data_ok_3() then
            [code fragment with its own indenting levels]
         end if
      end if
   end if
loop
C#
while ([condition])
{
   if (this.DataOk1())
   {
      if (this.DataOk2())
      {
         if (this.DataOk3())
         {
            [code fragment with its own indenting levels]
         }
      }
   }
}

*** GOOD code: ***


PB
do while [condition]
   if not this.uf_data_ok_1() then continue
   if not this.uf_data_ok_2() then continue
   if not this.uf_data_ok_3() then continue
   [code fragment with its own indenting levels]
loop
C#
while ([condition])
{
   if (!this.DataOk1()) continue;
   if (!this.DataOk2()) continue;
   if (!this.DataOk3()) continue;
   [code fragment with its own indenting levels]
}

No loop condition duplicating


Don't write the stop condition of a loop twice (before and inside the loop).


You can ask - why, if in most programming books we see the condition duplicated in loops whose body may be not executed even once? My "unofficial" answer is simple: why to write twice something which can be written only once to achieve the same goal, making the code longer? And the "official" answer is, probably, known to you - by the same reasons that we avoid any sort of code duplication: if a change must be done in the condition, both the copies of the code must be updated (but one of them can be mistakenly skipped).


*** BAD code: ***


PB
long ll_row = 0

ll_row = dw_cust.GetNextModified(ll_row, Primary!)
do while ll_row > 0
   [do something with the modified row]
   ll_row = dw_cust.GetNextModified(ll_row, Primary!)
loop
PL/SQL
FETCH emp_cursor INTO v_row;
WHILE emp_cursor%FOUND LOOP
   [do something]
   FETCH emp_cursor INTO v_row; -- deja vu! :-)
END LOOP;

An eternal loop with 'break' (or 'exit when', specific for PL/SQL) is very elegant solution in this situation:


*** GOOD code: ***


PB
long ll_row = 0

do while true
   ll_row = dw_cust.GetNextModified(ll_row, Primary!)
   if ll_row < 1 then exit
   [do something with the modified row]
loop
PL/SQL
LOOP
   FETCH emp_cursor INTO v_row;
   EXIT WHEN emp_cursor%NOTFOUND; -- or "IF emp_cursor%NOTFOUND THEN BREAK;"
   [do something]
END LOOP;

One executable statement per line


Don't write more than one executable statement in one line of code.


It's not only can be messy and not elegant but also makes debugging more difficult: if you obey this rule then you will always see what is the statement the debugger's cursor is standing near. For example, if your IF statement

is written in one line then you cannot clearly see if the program flow has stepped into the IF (and cannot put a breakpoint in it).

*** BAD code: ***


PB
if IsNull(li_age) then li_age = this.uf_get_age(ll_emp_id)
C#
if (age == null) age = this.GetAge(empId);
PL/SQL
IF v_age IS NULL THEN v_age := GET_AGE(v_emp_id); END IF;

*** GOOD code: ***


PB
if IsNull(li_age) then
   li_age = this.uf_get_age(ll_emp_id)
end if
C#
if (lastName == null)
   {
   age = this.GetAge(empId);
   }
PL/SQL
IF v_age IS NULL THEN
   v_age := GET_AGE(v_emp_id);
END IF;

This idea is especially important in the fight against expressions complexity. As we have discussed, good developers break one long function into a number of short sub-functions - the same philosophy works for complex, super-nested expression: we should break them down into stand-alone, simple lines:


*** BAD code: ***


PB
ls_country_name = this.uf_get_country_name_by_city_id(this.uf_get_country_id_by_city_id(this.uf_get_city_id(ll_row_num)))
C#
countryName = this.GetCountryNameByCountryId(this.GetCountryIdByCityId(this.GetCityId(rowNum)));

*** GOOD code: ***


PB
ls_city_id = this.uf_get_city_id(ll_row_num)
ls_country_id = this.uf_get_country_id_by_city_id(ls_city_id)
ls_country_name = this.uf_get_country_name_by_country_id(ls_country_id)
C#
cityId = this.GetCityId(rowNum);
countryId = this.GetCountryIdByCityId(cityId);
countryName = this.GetCountryNameByCountryId(countryId);

The last example demonstrates how this approach simplifies debugging (in addition to better readability!): if the variable countryName has not been populated as expected then you can see in the debugger (without STEP IN) which step exactly fails. In the nested version, if you want to STEP IN GetCountryNameByCountryId to debug it then you are forced firstly to STEP IN (and STEP OUT from) each one of the inner methods, beginning from the most nested GetCityId.

Store values, returned by subroutines and expressions, in variables


Don't directly use values, returned by subroutines and expressions, as an input of other subroutines and expressions; instead, store them in interim variables, and use these variables in the rest of the method.


There are at least two reasons to do that:

  1. It makes debugging easier - you can see the returned value immediately, before it is "buried" in any kind of complicated collection or data set, or even disappears at all after a comparison (like if (myVar == GetSomething())).

  2. It helps to break one complicated line of code into 2-3 simpler and clearer lines if the called function is nested in an expression.

These two reasons dictate to store the return value of a function in a variable always - even if the function is called only once (if more than once then we have no questions at all!). The only exception from this rule - a well-named Boolean function which is called only once in the script and is used directly in an if statement: in this case we can understand the returned value - true or false - looking in the debugger if the program flow goes into the if or skips it (or goes into the else section). Ensure the called function doesn't return null which is treated as false by the program flow.


If you need to call a same deterministic function (i.e. giving always the same output for the same input in the given context) more than once in your script in order to re-use its ret. value in different fragments, then there are additional advantages of storing the ret. value in a variable:

  1. Better performance (the function is run only once), especially if the function is not extremely efficient.

  2. Signals that the called function is deterministic (otherwise developers can think it returns different values on each call).

So, call the function only once, and use the returned value in the rest of the script as many times as you need. This advice is especially important when the result of a function is used as a stop condition in a loop. For example:

PB
for ll_row = 1 to dw_cust.RowCount()
   ...
next
C#
for (i = 1; i++; i < SomeCollection.GetRowsCount())
{
   ...
}

Imagine that the function returns 10000 - it means, it is called 10000 times! The solution is obvious:


PB
long ll_row_count
ll_row_count = dw_cust.RowCount()
for ll_row = 1 to ll_row_count
   ...
next
C#
int rowsCount = SomeCollection.GetRowsCount();
for (i = 1; i++; i < rowsCount)
{
   ...
}

But be careful - if the loop can change the number of rows in the collection, then function is not deterministic!

Process impossible options (a piece of defensive programming)


Process in choose case/switch constructions all existing options. Signal error if an unexpected option is supplied to the construction in the run-time.


When your code processes a number of pre-defined options (like statuses or activity codes) then don't forget: the list of possible options can grow in the future, and you (or other developers) can forget to process the newborn option(s) producing a hidden bug. We can prevent that by forcing the code to complain: "Developer, THINK how to process the new option!". Two simple things should be done for that:


1. In the choose case/switch construction, add new case for EACH option which currently exists in the system. If an option is NOT processed then create a case for it anyway and write a comment like "do nothing", "irrelevant" or "not applicable" to let developers know you don’t process them intentionally and not by mistake.


2. Add case else/default section that will signal about an unexpected option (throw an exception, display an error message or whatever).


Shortly, see two the following fragments. In the example I will use 4 customer statuses: Active, Pending, Inactive and Deleted, but only Active and Pending customers are processed currently by the business logic:


*** BAD code ***

(inactive status is not even mentioned...):

PB
choose case ls_cust_status
case n_cust_status.ACTIVE
   [code fragment processing active customers]
case n_cust_status.PENDING
   [code fragment processing pending customers]
end choose
C#
switch (custStatus)
{
   case CustStatus.Active:
      [code fragment processing active customers]
      break;
   case CustStatus.Pending:
      [code fragment processing pending customers]
      break;
}

*** GOOD code ***

(inactive status is listed explicitly in a specially created 'case'):

PB
choose case ls_cust_status
case n_cust_status.ACTIVE
   [code fragment processing active customers]
case n_cust_status.PENDING
   [code fragment processing pending customers]
case n_cust_status.INACTIVE
   // do nothing
case else
   f_throw(PopulateError(0, "No case defined for customer status " + ls_cust_status))
end choose
C#
switch (custStatus)
{
   case CustStatus.Active:
      [code fragment processing active customers]
      break;
   case CustStatus.Pending:
      [code fragment processing pending customers]
      break;
   case CustStatus.Inactive:
   case CustStatus.Deleted:
      // do nothing
      break;
   default:
      throw new Exception ("No case defined for customer status " + custStatus);
}

So, if a new status (for example, 'deceased') will be added to the business after many years then the code fragment itself will force the then developers to update the logic: if the variable ls_cust_status/custStatus will contain the value of n_cust_status.DESEASED/CustStatus.Deceased, the program flow will go to the case else/default section and the exception will be thrown. And that will usually happen in a pretty early stage of development or unit test! But even if the exception will be thrown in the production - it's better than long living with a hidden bug (which can be potentially very expensive). Maybe, the new case must be treated in a special way, and you have to decide (or discuss with business analysts) how to process it. If the new status requires no special treatment, then simply add it to the "do nothing" case. I remember getting a chain of such errors when I was building an application for a football association (six or seven - it was funny!) after adding a new code value - that allowed me to fix possible bugs even before unit test! The messages chain was similar to a penalty shootouts - and I won with no goal conceded!



Never write business code in case else/default section. If a few options must be treated in a same way, physically list them ALL in one case. There are three reasons for that:

We discussed choose case/switch-es, but the conception works also for if-s. Usually you are forced to replace if construction with choose case/switch to prevent terrible heap of else if-s:


*** BAD code: ***

PB
if is_curr_entity = n_entity.CAR then
   this.Retrieve(il_car_id)
else
   this.Retrieve(il_bike_id)
end if
C#
if (_currEntity == Entity.Car)
{
   this.Retrieve(_carId);
}
else
{
   this.Retrieve(_bikeId);
}

*** GOOD code: ***

PB
choose case is_curr_entity
case n_entity.CAR
   this.Retrieve(il_car_id)
case n_entity.BIKE
   this.Retrieve(il_bike_id)
case else
   MessageBox("Debug Error", "No case defined for entity " + is_curr_entity + " in dw_xxx.uf_yyy()")
end choose
C#
switch (_currEntity)
{
   case Entity.Car:
      this.Retrieve(_carId);
      break;
   case Entity.Bike:
      this.Retrieve(_bikeId);
      break;
   default:
      throw new Exception ("No case defined for entity " + _currEntity);
}

Don't use Boolean flags (including CHAR variables with possible values like 'Y' and 'N' or '1' and '0') if there can be more than 2 options. Let's say, you have one class for both cars and bikes (because these entities are processed is a very similar way). Sometimes you want to know in which of two the modes the object (the instance) is created - in the car mode or in the bike mode. So, you could think about a Boolean flag _isCarMode (C#)/ib_is_car_mode (PB) which will be initialized to true or false depending on the mode and used this way:


*** BAD code: ***

PB
if ib_is_car_mode then
   this.Retrieve(il_car_id)
else
   this.Retrieve(il_bike_id)
end if
C#
if (_isCarMode)
{
   this.Retrieve(_carId);
}
else // Bike Mode
{
   this.Retrieve(_bikeId);
}

The best solution in this situation is to create a variable of an enum type (in PB, which doesn't support enums until version 12, create 2 the already described constants n_entity.CAR and n_entity.BIKE) notwithstanding that there are only 2 possible options - see the previous "GOOD code". So, if one day you want to use the discussed class also for boats, simply add Boat to Entities enumeration (or create n_entity.BOAT constant), initialize _currEntity/is_curr_entity with it, run the application and... follow your own instructions, written months or years ago!


Sometimes it makes sense NOT to follow this advice - for example, if there are really many options but you process one or two of them.


Application codes

constants (or enumerations)

Don't hardcode application codes (like statuses, entities’ types etc.); instead, use constants or enumerations.


It’s always a good practice to create constants/enumerations for codes you mention in code, even if these codes used to be hardcoded in the application for years. Prefer the correct programming stile (with its elegancy and maintainability), not bad traditions.



*** BAD code: ***

PB
if li_inv_status = 8 then...
C#
if (invStatus == 8)...

*** GOOD code: ***

PB
if li_inv_status = nv_inv_status.CANCELED then...
C#
if (invStatus == InvStatus.Canceled)...

Mnemonic strings as application codes


Make system codes' values self-explaining, so developers don't need to look in the catalog tables to understand their meaning.


We can avoid the problem, described in the previous topic, if our application codes (like statuses, modes etc.) are textual and not meaningless numeric (like 1, 2, 3...). To have self-explanatory codes - it's a great idea which makes debugging, reading code and exploring data in DB tables much easier! It's better to see 'CLOSED' (instead of mystery number 3) as a variable value in debugger or a value in a DB table when your brain is busy looking for a bug, isn't it? These mnemonic strings don't need to be long - VARCHAR(10) is enough, we can always shorten words if needed (still keeping them understandable).


*** BAD approach ***

(developers need to look in the codes catalog table to understand the numeric code):

ORDER table:

order_id customer_id order_status_code
123001 98765 3
123002 43210 1
123003 12345 2
ORDER_STATUS table (codes catalog):

codedescription
1'Open'
2'Cancelled'
3'Closed'

*** GOOD approach ***

(codes are self-understandable; the codes catalog is kept only for referential integrity and to display codes in GUI):

ORDER table:

order_id customer_id order_status_code
123001 98765 'CLSD'
123002 43210 'OPN'
123003 12345 'CNCLD'
ORDER_STATUS table (codes catalog):

codedescription
'OPN''Open'
'CNCLD''Cancelled'
'CLSD''Closed'

But anyway we have to use constants or enums in our code and not the values directly:


*** BAD code: ***

PB if ls_new_status = 'CLOSED' then...
C# if (newStatus == 'CLOSED')...


*** GOOD code: ***

PB if ls_new_status = n_order_status.CLOSED then...
C# if (newStatus == OrderStatus.Closed)...


The first approach is readable quite well, but bugs-prone: we can mistype the status, and no compilation error will be gotten.

/* Comments */


Comment your code if it is not completely straightforward.


It is needed not only for other developers, who will try to understand your code, but for yourself too: now you remember what the code does, but will you after a week, after a month, after a year? But don't leave comments if they are not really needed. Incredible, but I saw comments 'Declare variables:' just before a variables declaration section and 'Call function XXX:' just before calling function XXX! But really foggy fragments were absolutely comments-free!


Comments can help you not only understand existing methods, but also write new ones: just after creation of a new function, write comment before each future code fragment, performing a different logical task (as if you would comment an existing code), and AFTER THAT begin to write executed code. Below is an example of an initial skeleton (built of comments!) for a function which calculates a new balance for a customer (also pay attention that we can avoid articles "the" and "a" in comments - that shortens them which is always good):

private decimal calcNewBalance (integer customerID, decimal amountToAdd)
   {
   // Validate parameters:
	
   // Retrieve existing balance of customer:
			
   // Retrieve their allowed overdraft:
			
   // Calculate new balance:
			
   // If it exceeds allowed overdraft then return original balance:
			
   // If this code reached then new amount is ok - return it:

   }

In that very early stage of the function writing it's easier to concentrate on what should be done by the function (I would call that "upper-level business logic"), and only after the comments skeleton has been written we can begin writing executed code (each fragment just after its corresponding comment) with much less chances of having to re-write code.

Private and public affairs in instance variables


Avoid public instance variables.


Create them only private (or protected if you are planning the class to be extended and the descendants can have a need to access the variable directly). The purpose of instance/static variables is to keep internal information of the object/class, so the outer world (other classes) should access this information only using public properties (or public methods, if your programming language doesn't have properties), i.e. in a controllable way.

Write RETURN in the end of void methods


Always end your void methods with return instruction.


Compilers don't force us to write return as the last statement of subroutines which return nothing, but it can be pretty useful in debugging if the last executable construction of the function enables program flow to go inside it or to jump over it (for example, if the script's LAST construction is an if, switch or a loop). If that construction is skipped by the program flow (let's say, doesn't go inside an if) then the debuggers cursor will stop on the return being discussed, so you have a chance to do something before the debugged function has been exited. Don't analyze if the final return can be useful or not - simply write it in the end of EACH void function, that will never do any harm (that was a requirement in one of companies I has worked for).




blog comments powered by Disqus




<< prev CONTENTS next >>