






                 Recursive Make Considered Harmful

                            _P_e_t_e_r _M_i_l_l_e_r
                      millerp@canb.auug.org.au



                              AABBSSTTRRAACCTT
         For large UNIX projects, the traditional method of
         building the project is to use recursive _m_a_k_e_.  On
         some  projects,  this results in build times which
         are unacceptably large, when all you want to do is
         change one file.    In examining the source of the
         overly long build times, it became evident that  a
         number of apparently unrelated problems combine to
         produce the delay, but on analysis  all  have  the
         same root cause.
         This paper explores a number of problems regarding
         the use of recursive _m_a_k_e_, and shows that they are
         all  symptoms  of the same problem.  Symptoms that
         the UNIX community have long accepted as a fact of
         life,  but  which  need not be endured any longer.
         These problems include recursive _m_a_k_es which  take
         ``forever'' to work out that they need to do noth-
         ing, recursive _m_a_k_es which do  too  much,  or  too
         little, recursive _m_a_k_es which are overly sensitive
         to changes in the source code and require constant
         Makefile intervention to keep them working.
         The  resolution  of these problems can be found by
         looking at what _m_a_k_e does, from first  principles,
         and  then  analyzing  the  effects  of introducing
         recursive _m_a_k_e to  this  activity.   The  analysis
         shows  that  the problem stems from the artificial
         partitioning of the build into  separate  subsets.
         This,  in  turn,  leads to the symptoms described.
         To avoid the symptoms, it  is  only  necessary  to
         avoid the separation; to use a single _m_a_k_e session
         to build the whole project, which is not quite the
         same as a single Makefile.
         This  conclusion  runs counter to much accumulated
         folk wisdom in building large  projects  on  UNIX.
         Some  of  the  main objections raised by this folk
         wisdom are examined and  shown  to  be  unfounded.
         The  results  of actual use are far more encourag-
         ing, with routine development performance improve-
         ments  significantly  faster  than  intuition  may
         indicate, and without the intuitvely expected com-
         promise of modularity.  The use of a whole project
         _m_a_k_e is not as difficult to put into  practice  as
         it may at first appear.






    Peter Miller            19 March 2005                 Page 1





    AUUGN'97                   Recursive Make Considered Harmful


               +------------------------------------+
               |Miller, P.A. (1998), _R_e_c_u_r_s_i_v_e _M_a_k_e |
               |_C_o_n_s_i_d_e_r_e_d _H_a_r_m_f_u_l_,                 |
               |AUUGN Journal of AUUG Inc.,  19(1), |
               |pp. 14-25.                          |
               +------------------------------------+



    11..  IInnttrroodduuccttiioonn

    For large UNIX software  development  projects,  the  tradi-
    tional  methods of building the project use what has come to
    be known as ``recursive _m_a_k_e.''  This refers to the use of a
    hierarchy  of  directories  containing  source files for the
    modules which make up the project, where each  of  the  sub-
    directories  contains  a  _M_a_k_e_f_i_l_e which describes the rules
    and instructions for the _m_a_k_e program.  The complete project
    build  is  done  by  arranging for the top-level Makefile to
    change directory into each of the sub-directories and recur-
    sively invoke _m_a_k_e_.

    This  paper  explores  some significant problems encountered
    when developing software projects using the  recursive  _m_a_k_e
    technique.   A  simple  solution is offered, and some of the
    implications of that solution are explored.

    Recursive _m_a_k_e results in a directory tree which looks some-
    thing like this:
                          +++
                          ++-_P+_r+_o_j_e_c_t
                           ++++Mmaokdeufliel1e
                           |++-++Makefile
                           | +-++source1.c
                           | +-++_e_t_c_._._.
                           ++++m+o+dule2
                             +-++Makefile
                             +-++source2.c
                             +-++_e_t_c_._._.
                                |
    This  hierarchy  of  modules can be nested arbitrarily deep.
    Real-world projects often use two-  and  three-level  struc-
    tures.

    11..11..  AAssssuummeedd KKnnoowwlleeddggee

    This paper assumes that the reader is familiar with develop-
    ing software on UNIX, with the _m_a_k_e program,  and  with  the
    issues of C programming and include file dependencies.


    -----------
    Copyright   (C)  1997  Peter  Miller;  All  rights
    reserved.



    Peter Miller            19 March 2005                 Page 2





    AUUGN'97                   Recursive Make Considered Harmful


    This  paper assumes that you have installed GNU Make on your
    system and are moderately familiar with its features.   Some
    features of _m_a_k_e described below may not be available if you
    are using the limited version supplied by your vendor.

    22..  TThhee PPrroobblleemm

    There are numerous problems with recursive  _m_a_k_e,  and  they
    are usually observed daily in practice.  Some of these prob-
    lems include:

    +o It is very hard to get the _o_r_d_e_r of the recursion into the
      sub-directories  correct.  This order is very unstable and
      frequently needs to be manually  ``tweaked.''   Increasing
      the  number of directories, or increasing the depth in the
      directory tree, cause this order to be increasingly unsta-
      ble.

    +o It  is  often  necessary to do more than one pass over the
      sub-directories to build the whole  system.   This,  natu-
      rally, leads to extended build times.

    +o Because  the builds take so long, some dependency informa-
      tion is omitted, otherwise development builds take  unrea-
      sonable  lengths of time, and the developers are unproduc-
      tive.  This usually leads to things not being updated when
      they  need to be, requiring frequent ``clean'' builds from
      scratch, to ensure everything has actually been built.

    +o Because inter-directory dependencies are either omitted or
      too  hard  to  express, the Makefiles are often written to
      build _t_o_o _m_u_c_h to ensure that nothing is left out.

    +o The inaccuracy of the dependencies, or the simple lack  of
      dependencies,  can  result in a product which is incapable
      of building cleanly, requiring the  build  process  to  be
      carefully watched by a human.

    +o Related  to the above, some projects are incapable of tak-
      ing advantage of various ``parallel make'' impementations,
      because the build does patently silly things.

    Not  all  projects  experience all of these problems.  Those
    that do experience the problems may  do  so  intermittently,
    and  dismiss the problems as unexplained ``one off'' quirks.
    This paper attempts to bring together a  range  of  symptoms
    observed over long practice, and presents a systematic anal-
    ysis and solution.

    It must be emphasized that this paper does not suggest  that
    _m_a_k_e  itself is the problem.  This paper is working from the
    premise that _m_a_k_e does nnoott have a bug, that  _m_a_k_e  does  nnoott
    have  a design flaw.  The problem is not in _m_a_k_e at all, but
    rather in the input given to _m_a_k_e - the way  _m_a_k_e  is  being



    Peter Miller            19 March 2005                 Page 3





    AUUGN'97                   Recursive Make Considered Harmful


    used.

    33..  AAnnaallyyssiiss

    Before  it  is possible to address these seemingly unrelated
    problems, it is first necessary to understand what _m_a_k_e does
    and  how  it  does  it.   It is then possible to look at the
    effects recursive _m_a_k_e has on how _m_a_k_e behaves.

    33..11..  WWhhoollee PPrroojjeecctt MMaakkee

    _M_a_k_e is an expert system.  You give it a set  of  rules  for
    how  to  construct  things,  and a target to be constructed.
    The rules can be decomposed into pair-wise ordered dependen-
    cies between files.  _M_a_k_e takes the rules and determines how
    to build the given target.  Once it has  determined  how  to
    construct the target, it proceeds to do so.

    _M_a_k_e  determines  how  to build the target by constructing a
    _d_i_r_e_c_t_e_d _a_c_y_c_l_i_c _g_r_a_p_h_, the DAG familiar  to  many  Computer
    Science  students.  The vertices of this graph are the files
    in the system, the edges of this graph  are  the  inter-file
    dependencies.   The  edges of the graph are directed because
    the pair-wise dependencies  are  ordered;  resulting  in  an
    _a_c_y_c_l_i_c graph - things which look like loops are resolved by
    the direction of the edges.

    This paper will use a small example project for  its  analy-
    sis.   While  the  number of files in this example is small,
    there is sufficient complexity to  demonstrate  all  of  the
    above  recursive _m_a_k_e problems.  First, however, the project
    is presented in a non-recursive form.
                           +++
                           ++-_P+_r+_o_j_e_c_t
                            +-++Mmaakienf.icle
                            +-++parse.c
                            +-++parse.h
                              ++

    The Makefile in this small project looks like this:

                    +--------------------------+
                    |OBJ = main.o parse.o      |
                    |prog: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |main.o: main.c parse.h    |
                    |  $(CC) -c main.c         |
                    |parse.o: parse.c parse.h  |
                    |  $(CC) -c parse.c        |
                    +--------------------------+
    Some of the  implicit  rules  of  _m_a_k_e  are  presented  here
    explicitly,  to assist the reader in converting the Makefile
    into its equivalent DAG.




    Peter Miller            19 March 2005                 Page 4





    AUUGN'97                   Recursive Make Considered Harmful


    The above Makefile can be drawn as a DAG  in  the  following
    form:
                     
                                prog



                          main.o   parse.o


                      main.c   parse.h  parse.c



    This is an _a_c_y_c_l_i_c graph because of the arrows which express
    the ordering of the  relationship  between  the  files.   If
    there  _w_a_s a circular dependency according to the arrows, it
    would be an error.

    Note that the object files (.o) are dependent on the include
    files  (.h) even though it is the source files (.c) which do
    the including.  This is because if an include file  changes,
    it is the object files which are out-of-date, not the source
    files.

    The second part of what _m_a_k_e does it to perform a  _p_o_s_t_o_r_d_e_r
    traversal of the DAG.  That is, the dependencies are visited
    first.  The actual order of traversal is undefined, but most
    _m_a_k_e  implementations work down the graph from left to right
    for edges below the same vertex, and most  projects  implic-
    itly  rely on this behavior.  The last-time-modified of each
    file is examined, and higher files are determined to be out-
    of-date  if  any of the lower files on which they depend are
    younger.  Where a file is determined to be out-of-date,  the
    action  associated with the relevant graph edge is performed
    (in the above example, a compile or a link).

    The use of recursive _m_a_k_e affects both phases of the  opera-
    tion of _m_a_k_e_: it causes _m_a_k_e to construct an inaccurate DAG,
    and it forces _m_a_k_e to traverse the DAG in  an  inappropriate
    order.

    33..22..  RReeccuurrssiivvee MMaakkee

    To examine the effects of recursive _m_a_k_es, the above example
    will be artificially segmented into two modules,  each  with
    its  own  Makefile,  and a top-level Makefile used to invoke
    each of the module Makefiles.

    This example is intentionally artificial, and thoroughly so.
    However,  all  ``modularity'' of all projects is artificial,
    to some extent.  Consider: for  many  projects,  the  linker
    flattens it all out again, right at the end.




    Peter Miller            19 March 2005                 Page 5





    AUUGN'97                   Recursive Make Considered Harmful


    The directory structure is as follows:
                          +++
                          ++-_P+_r+_o_j_e_c_t
                           ++++Maanktefile
                           |++-++Makefile
                           | +-++main.c
                           ++++b+e+e
                             +-++Makefile
                             +-++parse.c
                             +-++parse.h
                                |
    The  top-level  Makefile  often  looks  a  lot  like a shell
    script:

                  +-------------------------------+
                  |MODULES = ant bee              |
                  |all:                           |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  +-------------------------------+
    The ant/Makefile looks like this:

                  +------------------------------+
                  |all: main.o                   |
                  |main.o: main.c ../bee/parse.h |
                  |  $(CC) -I../bee -c main.c    |
                  +------------------------------+
    and the equivalent DAG looks like this:
                         
                               main.o



                          main.c    parse.h

    The bee/Makefile looks like this:

                   +----------------------------+
                   |OBJ = ../ant/main.o parse.o |
                   |all: prog                   |
                   |prog: $(OBJ)                |
                   |  $(CC) -o $@ $(OBJ)        |
                   |parse.o: parse.c parse.h    |
                   |  $(CC) -c parse.c          |
                   +----------------------------+
    and the equivalent DAG looks like this:










    Peter Miller            19 March 2005                 Page 6





    AUUGN'97                   Recursive Make Considered Harmful


                       
                              prog



                        main.o    parse.o


                             parse.h  parse.c



    Take a close look at the DAGs.  Notice how neither  is  com-
    plete  -  there  are vertices and edges (files and dependen-
    cies) missing from both DAGs.  When the entire build is done
    from the top level, everything will work.

    But  what  happens  when  small changes occur?  For example,
    what would happen if the parse.c and parse.h files were gen-
    erated from a parse.y yacc grammar?  This would add the fol-
    lowing lines to the bee/Makefile:

                    +--------------------------+
                    |parse.c parse.h: parse.y  |
                    |  $(YACC) -d parse.y      |
                    |  mv y.tab.c parse.c      |
                    |  mv y.tab.h parse.h      |
                    +--------------------------+
    And the equivalent DAG changes to look like this:
                       
                              prog



                        main.o    parse.o


                             parse.h  parse.c



                                  parse.y



    This change has a  simple  effect:  if  parse.y  is  edited,
    main.o  will  nnoott be constructed correctly.  This is because
    the DAG for ant knows about only some of the dependencies of
    main.o, and the DAG for bee knows none of them.

    To  understand  why this happens, it is necessary to look at
    the actions _m_a_k_e will take _f_r_o_m _t_h_e _t_o_p _l_e_v_e_l_.  Assume  that
    the project is in a self-consistent state.  Now edit parse.y
    in such a way that the generated parse.h file will have non-



    Peter Miller            19 March 2005                 Page 7





    AUUGN'97                   Recursive Make Considered Harmful


    trivial  differences.   However,  when the top-level _m_a_k_e is
    invoked, first ant and then bee is visited.  But  ant/main.o
    is  _n_o_t  recompiled,  because  bee/parse.h  has not yet been
    regenerated and thus does not yet indicate  that  main.o  is
    out-of-date.   It  is not until bee is visited by the recur-
    sive _m_a_k_e that parse.c and parse.h are  reconstructed,  fol-
    lowed  by  parse.o.   When  the program is linked main.o and
    parse.o are non-trivially incompatible.  That is,  the  pro-
    gram is _w_r_o_n_g_.

    33..33..  TTrraaddiittiioonnaall SSoolluuttiioonnss

    There  are three traditional fixes for the above ``glitch.''

    33..33..11..  RReesshhuuffffllee

    The first is to manually tweak the order of the  modules  in
    the  top-level  Makefile.  But why is this tweak required at
    all?  Isn't _m_a_k_e supposed to be an expert system?   Is  _m_a_k_e
    somehow flawed, or did something else go wrong?

    To answer this question, it is necessary to look, not at the
    graphs, but the _o_r_d_e_r _o_f _t_r_a_v_e_r_s_a_l of the graphs.  In  order
    to  operate  correctly,  _m_a_k_e  needs  to perform a _p_o_s_t_o_r_d_e_r
    traversal, but in separating the DAG into two  pieces,  _m_a_k_e
    has  not been _a_l_l_o_w_e_d to traverse the graph in the necessary
    order - instead the project has dictated an order of traver-
    sal.   An order which, when you consider the original graph,
    is plain _w_r_o_n_g_.  Tweaking the  top-level  Makefile  corrects
    the order to one similar to that which _m_a_k_e could have used.
    Until the next dependency is added...

    Note that ``make -j'' (parallel build) invalidates  many  of
    the ordering assumptions implicit in the reshuffle solution,
    making it useless.  And then there are all  of the sub-makes
    all doing their builds in parallel, too.

    33..33..22..  RReeppeettiittiioonn

    The  second  traditional  solution  is to make more than one
    pass in the top-level Makefile, something like this:

                  +-------------------------------+
                  |MODULES = ant bee              |
                  |all:                           |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  +-------------------------------+





    Peter Miller            19 March 2005                 Page 8





    AUUGN'97                   Recursive Make Considered Harmful


    This doubles the length of time  it  takes  to  perform  the
    build.   But that is not all: there is no guarantee that two
    passes are enough!  The upper bound of the number of  passes
    is  not  even  proportional  to the number of modules, it is
    instead proportional to the  number  of  graph  edges  which
    cross module boundaries.

    33..33..33..  OOvveerrkkiillll

    We  have  already  seen an example of how recursive _m_a_k_e can
    build too little, but another common problem is to build too
    much.  The third traditional solution to the above glitch is
    to add even _m_o_r_e lines to ant/Makefile:

                    +--------------------------+
                    |.PHONY: ../bee/parse.h    |
                    |../bee/parse.h:           |
                    |    cd ../bee; \          |
                    |    make clean; \         |
                    |    make all              |
                    +--------------------------+
    This means that whenever main.o is made, parse.h will always
    be  considered to be out-of-date.  All of bee will always be
    rebuilt including parse.h, and  so  main.o  will  always  be
    rebuilt, _e_v_e_n _i_f _e_v_e_r_y_t_h_i_n_g _w_a_s _s_e_l_f _c_o_n_s_i_s_t_e_n_t_.

    Note  that  ``make -j'' (parallel build) invalidates many of
    the ordering assumptions implicit in the overkill  solution,
    making  it  useless,  because  all  of the sub-makes are all
    doing their builds ("clean" then "all")  in  parallel,  con-
    stantly  interfering  with  each  other in non-deterministic
    ways.

    44..  PPrreevveennttiioonn

    The above analysis is based on one simple  action:  the  DAG
    was  artificially  separated  into  incomplete pieces.  This
    separation resulted in  all  of  the  problems  familiar  to
    recursive _m_a_k_e builds.

    Did  _m_a_k_e  get it wrong?  No.  This is a case of the ancient
    GIGO principle: _G_a_r_b_a_g_e _I_n_, _G_a_r_b_a_g_e _O_u_t_.   Incomplete  Make-
    files are _w_r_o_n_g Makefiles.

    To  avoid  these  problems, don't break the DAG into pieces;
    instead, use one Makefile for the entire project.  It is not
    the  recursion  itself  which is harmful, it is the crippled
    Makefiles which are used in the recursion which  are  _w_r_o_n_g.
    It is not a deficiency of _m_a_k_e itself that recursive _m_a_k_e is
    broken, it does the best it can with the flawed input it  is
    given.

         ``_B_u_t_,  _b_u_t_,  _b_u_t_._._.   _Y_o_u _c_a_n_'_t _d_o _t_h_a_t_!'' I hear
         you cry.  ``_A _s_i_n_g_l_e Makefile  _i_s  _t_o_o  _b_i_g_,  _i_t_'_s



    Peter Miller            19 March 2005                 Page 9





    AUUGN'97                   Recursive Make Considered Harmful


         _u_n_m_a_i_n_t_a_i_n_a_b_l_e_,  _i_t_'_s _t_o_o _h_a_r_d _t_o _w_r_i_t_e _t_h_e _r_u_l_e_s_,
         _y_o_u_'_l_l _r_u_n _o_u_t _o_f _m_e_m_o_r_y_, _I _o_n_l_y _w_a_n_t _t_o _b_u_i_l_d  _m_y
         _l_i_t_t_l_e  _b_i_t_,  _t_h_e  _b_u_i_l_d _w_i_l_l _t_a_k_e _t_o_o _l_o_n_g_.  _I_t_'_s
         _j_u_s_t _n_o_t _p_r_a_c_t_i_c_a_l_.''

    These are valid concerns,  and  they  frequently  lead  _m_a_k_e
    users  to the conclusion that re-working their build process
    does not have any short- or long-term benefits.   This  con-
    clusion is based on ancient, enduring, false assumptions.

    The  following  sections will address each of these concerns
    in turn.

    44..11..  AA SSiinnggllee Makefile IIss TToooo BBiigg

    If the entire project build description were placed  into  a
    single Makefile this would certainly be true, however modern
    _m_a_k_e implementations have _i_n_c_l_u_d_e statements.  By  including
    a  relevant fragment from each module, the total size of the
    Makefile and its include files need be no  larger  than  the
    total size of the Makefiles in the recursive case.

    44..22..  AA SSiinnggllee Makefile IIss UUnnmmaaiinnttaaiinnaabbllee

    The  complexity  of  using a single top-level Makefile which
    includes a fragment from each module is no more complex than
    in  the  recursive  case.  Because the DAG is not segmented,
    this form of Makefile becomes less complex,  and  thus  _m_o_r_e
    maintainable,  simply  because fewer ``tweaks'' are required
    to keep it working.

    Recursive Makefiles have a great deal of  repetition.   Many
    projects solve this by using include files.  By using a sin-
    gle Makefile for the project, the need  for  the  ``common''
    include files disappears - the single Makefile is the common
    part.

    44..33..  IItt''ss TToooo HHaarrdd TToo WWrriittee TThhee RRuulleess

    The only change required is to include the directory part in
    filenames  in  a number of places.  This is because the _m_a_k_e
    is performed  from  the  top-level  directory;  the  current
    directory  is  not the one in which the file appears.  Where
    the output file is explicitly stated in a rule, this is  not
    a problem.

    GCC  allows  a  -o option in conjunction with the -c option,
    and GNU Make knows this.  This results in the implicit  com-
    pilation  rule  placing  the  output  in  the correct place.
    Older and dumber C compilers, however, may not allow the  -o
    option with the -c option, and will leave the object file in
    the top-level directory (_i_._e_. the wrong  directory).   There
    are  three  ways  for you to fix this: get GNU Make and GCC,
    override the built-in rule with one  which  does  the  right



    Peter Miller            19 March 2005                Page 10





    AUUGN'97                   Recursive Make Considered Harmful


    thing, or complain to your vendor.

    Also,  K&R  C  compilers will start the double-quote include
    path (#include "_f_i_l_e_n_a_m_e_._h")  from  the  current  directory.
    This  will not do what you want.  ANSI C compliant C compil-
    ers, however, start the double-quote include path  from  the
    directory  in which the source file appears; thus, no source
    changes are required.  If you don't have an ANSI C compliant
    C  compiler, you should consider installing GCC on your sys-
    tem as soon as possible.

    44..44..  II OOnnllyy WWaanntt TToo BBuuiilldd MMyy LLiittttllee BBiitt

    Most of the time, developers are  deep  within  the  project
    tree  and  they  edit  one or two files and then run _m_a_k_e to
    compile their changes and try them out.  They  may  do  this
    dozens  or  hundreds  of  times a day.  Being forced to do a
    full project build every time would be absurd.

    Developers always have the option of giving _m_a_k_e a  specific
    target.   This is always the case, it's just that we usually
    rely on the default target in the Makefile  in  the  current
    directory to shorten the command line for us.  Building ``my
    little bit'' can still be done with a  whole  project  Make-
    file, simply by using a specific target, and an alias if the
    command line is too long.

    Is doing a full project build every time so  absurd?   If  a
    change  made in a module has repercussions in other modules,
    because there is a dependency the developer  is  unaware  of
    (but  the  Makefile  is  aware of), isn't it better that the
    developer find out as early as possible?  Dependencies  like
    this _w_i_l_l be found, because the DAG is more complete than in
    the recursive case.

    The developer is rarely a seasoned old salt who knows  every
    one  of  the  million  lines  of  code in the product.  More
    likely the developer is a short-term contractor or a junior.
    You  don't want implications like these to blow up after the
    changes are integrated with the master source, you want them
    to  blow  up on the developer in some nice safe sand-box far
    away from the master source.

    If you want to make ``just your little'' bit because you are
    concerned  that performing a full project build will corrupt
    the project master source, due to  the  directory  structure
    used in your project, see the ``Projects _v_e_r_s_u_s Sand-Boxes''
    section below.

    44..55..  TThhee BBuuiilldd WWiillll TTaakkee TToooo LLoonngg

    This statement can be made from  one  of  two  perspectives.
    First,  that  a  whole project _m_a_k_e, even when everything is
    up-to-date,  inevitably  takes  a  long  time  to   perform.



    Peter Miller            19 March 2005                Page 11





    AUUGN'97                   Recursive Make Considered Harmful


    Secondly, that these inevitable delays are unacceptable when
    a developer wants to quickly compile and link the  one  file
    that they have changed.

    44..55..11..  PPrroojjeecctt BBuuiillddss

    Consider a hypothetical project with 1000 source (.c) files,
    each of which has its calling interface defined in a  corre-
    sponding  include  (.h) file with defines, type declarations
    and function prototypes.  These 1000  source  files  include
    their  own  interface definition, plus the interface defini-
    tions of any other module they may call.  These 1000  source
    files  are  compiled  into  1000 object files which are then
    linked into an executable program.   This  system  has  some
    3000  files which _m_a_k_e must be told about, and be told about
    the include dependencies, and also explore  the  possibility
    that implicit rules (.y -> .c for example) may be necessary.

    In order to build the DAG, _m_a_k_e must  ``stat''  3000  files,
    plus  an  additional  2000  files  or so, depending on which
    implicit rules your _m_a_k_e knows about and your  Makefile  has
    left  enabled.  On the author's humble 66MHz i486 this takes
    about 10 seconds; on native disk on faster platforms it goes
    even  faster.  With NFS over 10MB Ethernet it takes about 10
    seconds, no matter what the platform.

    This is an astonishing statistic!  Imagine being able to  do
    a  single file compile, out of 1000 source files, in only 10
    seconds, plus the time for the compilation itself.

    Breaking the set of files up into 100 modules,  and  running
    it as a recursive _m_a_k_e takes about 25 seconds.  The repeated
    process creation for the subordinate _m_a_k_e  invocations  take
    quite a long time.

    Hang  on  a  minute!   On real-world projects with less than
    1000 files, it takes an awful lot longer than 25 seconds for
    _m_a_k_e  to  work  out  that  it  has  nothing to do.  For some
    projects, doing it in only 25 minutes would be  an  improve-
    ment!   The  above result tells us that it is not the number
    of files which is slowing us down (that only takes  10  sec-
    onds),  and  it is not the repeated process creation for the
    subordinate _m_a_k_e invocations (that  only  takes  another  15
    seconds).  So just what _i_s taking so long?

    The  traditional  solutions  to  the  problems introduced by
    recursive _m_a_k_e often increase the number of subordinate _m_a_k_e
    invocations  beyond the minimum described here; _e_._g_. to per-
    form multiple repetitions (3.3.2), or to overkill cross-mod-
    ule  dependencies (3.3.3).  These can take a long time, par-
    ticularly when combined, but do not account for some of  the
    more spectacular build times; what else is taking so long?





    Peter Miller            19 March 2005                Page 12





    AUUGN'97                   Recursive Make Considered Harmful


    Complexity  of the Makefile is what is taking so long.  This
    is covered, below, in the _E_f_f_i_c_i_e_n_t _M_a_k_e_f_i_l_e_s section.

    44..55..22..  DDeevveellooppmmeenntt BBuuiillddss

    If, as in the 1000 file example, it only takes 10 seconds to
    figure  out  which  one of the files needs to be recompiled,
    there is no serious threat to the productivity of developers
    if  they do a whole-project _m_a_k_e as opposed to a module-spe-
    cific _m_a_k_e.  The advantage for the project is that the  mod-
    ule-centric  developer  is  reminded  at relevant times (and
    only relevant times) that their  work  has  wider  ramifica-
    tions.

    By consistently using C include files which contain accurate
    interface definitions (including function prototypes),  this
    will  produce  compilation errors in many of the cases which
    would result in a defective product.  By doing whole-project
    builds,  developers  discover  such errors very early in the
    development process, and can fix the problems when they  are
    least expensive.

    44..66..  YYoouu''llll RRuunn OOuutt OOff MMeemmoorryy

    This  is the most interesting response.  Once long ago, on a
    CPU far, far away, it may even have been true.  When Feldman
    [feld78]  first  wrote  _m_a_k_e  it was 1978 and he was using a
    PDP11.  Unix processes were limited to 64KB of data.

    On such a computer, the above project with  its  3000  files
    detailed  in  the whole-project Makefile, would probably _n_o_t
    allow the DAG and rule actions to fit in memory.

    But we are not using PDP11s any more.  The  physical  memory
    of  modern  computers  exceeds 10MB for _s_m_a_l_l computers, and
    virtual memory often exceeds 100MB.  It is going to  take  a
    project  with  hundreds  of  thousands  of  source  files to
    exhaust virtual memory on a _s_m_a_l_l modern computer.   As  the
    1000  source  file  example  takes less than 100KB of memory
    (try it, I did) it is unlikely that any  project  manageable
    in  a  single  directory  tree on a single disk will exhaust
    your computer's memory.

    44..77..  WWhhyy NNoott FFiixx TThhee DDAAGG IInn TThhee MMoodduulleess??

    It was shown in the above discussion that the  problem  with
    recursive  _m_a_k_e is that the DAGs are incomplete.  It follows
    that by adding the missing portions, the problems  would  be
    resolved  without  abandoning  the  existing  recursive _m_a_k_e
    investment.

    +o The developer needs to remember to do this.  The  problems
      will  not  affect  the  developer  of  the module, it will
      affect the developers  of  _o_t_h_e_r  modules.   There  is  no



    Peter Miller            19 March 2005                Page 13





    AUUGN'97                   Recursive Make Considered Harmful


      trigger to remind the developer to do this, other than the
      ire of fellow developers.

    +o It is difficult to work out where the changes need  to  be
      made.   Potentially  every  Makefile in the entire project
      needs to  be  examined  for  possible  modifications.   Of
      course,  you  can  wait for your fellow developers to find
      them for you.

    +o The include dependencies will be recomputed unnecessarily,
      or  will be interpreted incorrectly.  This is because _m_a_k_e
      is string based, and thus ``.''  and  ``../ant''  are  two
      different  places, even when you are in the ant directory.
      This is of concern when include dependencies are automati-
      cally generated - as they are for all large projects.

    By making sure that each Makefile is complete, you arrive at
    the point where the Makefile for at least  one  module  con-
    tains  the  equivalent  of  a whole-project Makefile (recall
    that these modules form a single project and are thus inter-
    connected), and there is no need for the recursion any more.

    55..  EEffffiicciieenntt MMaakkeeffiilleess

    The central theme of this paper is the _s_e_m_a_n_t_i_c side-effects
    of artificially separating a Makefile into the pieces neces-
    sary to perform a recursive _m_a_k_e.  However, once you have  a
    large  number  of  Makefiles,  the  speed  at which _m_a_k_e can
    interpret this multitude of files also becomes an issue.

    Builds can take ``forever'' for both these reasons: the tra-
    ditional  fixes  for  the  separated DAG may be building too
    much _a_n_d your Makefile may be inefficient.

    55..11..  DDeeffeerrrreedd EEvvaalluuaattiioonn

    The text in a Makefile must somehow be read from a text file
    and  understood  by _m_a_k_e so that the DAG can be constructed,
    and the specified actions attached to the  edges.   This  is
    all kept in memory.

    The  input  language for Makefiles is deceptively simple.  A
    crucial distinction that  often  escapes  both  novices  and
    experts  alike  is that _m_a_k_e's input language is _t_e_x_t _b_a_s_e_d_,
    as opposed to token based, as is the  case  for  C  or  AWK.
    _M_a_k_e does the very least possible to process input lines and
    stash them away in memory.

    As an example of this, consider the following assignment:

                    +--------------------------+
                    |OBJ = main.o parse.o      |
                    +--------------------------+
    Humans read this as the  variable  OBJ  being  assigned  two



    Peter Miller            19 March 2005                Page 14





    AUUGN'97                   Recursive Make Considered Harmful


    filenames ``main.o'' and ``parse.o''.  But _m_a_k_e does not see
    it that way.  Instead  OBJ is assigned the  _s_t_r_i_n_g  ``main.o
    parse.o''.  It gets worse:

                    +--------------------------+
                    |SRC = main.c parse.c      |
                    |OBJ = $(SRC:.c=.o)        |
                    +--------------------------+
    In  this  case humans expect _m_a_k_e to assign two filenames to
    OBJ, but _m_a_k_e actually assigns the string  ``$(SRC:.c=.o)''.
    This is because it is a _m_a_c_r_o language with deferred evalua-
    tion, as opposed to one with variables and immediate evalua-
    tion.

    If  this does not seem too problematic, consider the follow-
    ing Makefile:

                   +-----------------------------+
                   |SRC = $(shell echo 'Ouch!' \ |
                   |  1>&2 ; echo *.[cy])        |
                   |OBJ = \                      |
                   |  $(patsubst %.c,%.o,\       |
                   |    $(filter %.c,$(SRC))) \  |
                   |  $(patsubst %.y,%.o,\       |
                   |    $(filter %.y,$(SRC)))    |
                   |test: $(OBJ)                 |
                   |  $(CC) -o $@ $(OBJ)         |
                   +-----------------------------+
    How many times will the shell command  be  executed?   OOuucchh!!
    It  will  be executed _t_w_i_c_e just to construct the DAG, and a
    further _t_w_o times if the rule needs to be executed.

    If this shell command does anything complex or time  consum-
    ing  (and  it  usually  does) it will take _f_o_u_r times longer
    than you thought.

    But it is worth looking at the other portions  of  that  OBJ
    macro.   Each  time it is named, a huge amount of processing
    is performed:

    +o The argument to _s_h_e_l_l is a single  string  (all  built-in-
      functions  take  a single string argument).  The string is
      executed in a sub-shell, and the standard output  of  this
      command is read back in, translating newlines into spaces.
      The result is a single string.

    +o The argument to _f_i_l_t_e_r is a single string.  This  argument
      is  broken into two strings at the first comma.  These two
      strings are then each broken into sub-strings separated by
      spaces.   The  first  set are the patterns, the second set
      are the filenames.  Then, for each  of  the  pattern  sub-
      strings,  if  a filename sub-string matches it, that file-
      name is included in the output.  Once all  of  the  output
      has  been  found,  it is re-assembled into a single space-



    Peter Miller            19 March 2005                Page 15





    AUUGN'97                   Recursive Make Considered Harmful


      separated string.

    +o The argument to _p_a_t_s_u_b_s_t is a single string.   This  argu-
      ment  is broken into three strings at the first and second
      commas.  The third string is then broken into  sub-strings
      separated  by  spaces, these are the filenames.  Then, for
      each of the filenames which match the first string  it  is
      substituted according to the second string.  If a filename
      does not match, it is passed through unchanged.  Once  all
      of  the output has been generated, it is re-assembled into
      a single space-separated string.

    Notice how many times those strings are disassembled and re-
    assembled.   Notice  how  many  ways  that happens.  _T_h_i_s _i_s
    _s_l_o_w_.  The example here names just two  files  but  consider
    how inefficient this would be for 1000 files.  Doing it _f_o_u_r
    times becomes decidedly inefficient.

    If you are using a dumb _m_a_k_e that has no  substitutions  and
    no  built-in  functions, this cannot bite you.  But a modern
    _m_a_k_e has lots of built-in  functions  and  can  even  invoke
    shell  commands  on-the-fly.   The  semantics of _m_a_k_e's text
    manipulation is such that string  manipulation  in  _m_a_k_e  is
    very  CPU  intensive, compared to performing the same string
    manipulations in C or AWK.

    55..22..  IImmmmeeddiiaattee EEvvaalluuaattiioonn

    Modern _m_a_k_e implementations  have  an  immediate  evaluation
    ``:=''  assignment  operator.   The above example can be re-
    written as

                  +------------------------------+
                  |SRC := $(shell echo 'Ouch!' \ |
                  |  1>&2 ; echo *.[cy])         |
                  |OBJ := \                      |
                  |  $(patsubst %.c,%.o,\        |
                  |    $(filter %.c,$(SRC))) \   |
                  |  $(patsubst %.y,%.o,\        |
                  |    $(filter %.y,$(SRC)))     |
                  |test: $(OBJ)                  |
                  |  $(CC) -o $@ $(OBJ)          |
                  +------------------------------+
    Note that _b_o_t_h assignments are immediate evaluation  assign-
    ments.   If  the  first  were  not,  the shell command would
    always be executed twice.   If  the  second  were  not,  the
    expensive  substitutions  would  be performed at least twice
    and possibly four times.

    As a rule of thumb: always use immediate evaluation  assign-
    ment unless you knowingly want deferred evaluation.






    Peter Miller            19 March 2005                Page 16





    AUUGN'97                   Recursive Make Considered Harmful


    55..33..  IInncclluuddee FFiilleess

    Many Makefiles perform the same text processing (the filters
    above, for example) for  every  single  _m_a_k_e  run,  but  the
    results  of  the processing rarely change.  Wherever practi-
    cal, it is more efficient to record the results of the  text
    processing  into  a file, and have the Makefile include this
    file.

    55..44..  DDeeppeennddeenncciieess

    Don't be miserly with include files.   They  are  relatively
    inexpensive  to  read,  compared to $(shell), so more rather
    than less doesn't greatly affect efficiency.

    As an example of this, it is first necessary to  describe  a
    useful  feature  of  GNU Make: once a Makefile has been read
    in, if any of its included files were out-of-date (or do not
    yet  exist),  they are re-built, and then _m_a_k_e starts again,
    which has the result that _m_a_k_e is now  working  with  up-to-
    date include files.  This feature can be exploited to obtain
    automatic include file dependency tracking  for  C  sources.
    The obvious way to implement it, however, has a subtle flaw.

                    +--------------------------+
                    |SRC := $(wildcard *.c)    |
                    |OBJ := $(SRC:.c=.o)       |
                    |test: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |include dependencies      |
                    |dependencies: $(SRC)      |
                    |  depend.sh $(CFLAGS) \   |
                    |    $(SRC) > $@           |
                    +--------------------------+
    The depend.sh script prints lines of the form

         _f_i_l_e.o: _f_i_l_e.c _i_n_c_l_u_d_e.h ...

    The most simple implementation of this is to  use  _G_C_C_,  but
    you  will  need an equivalent awk script or C program if you
    have a different compiler:

                    +--------------------------+
                    |#!/bin/sh                 |
                    |gcc -MM -MG "$@"          |
                    +--------------------------+
    This implementation of tracking C include  dependencies  has
    several  serious flaws, but the one most commonly discovered
    is that the dependencies file does not,  itself,  depend  on
    the  C include files.  That is, it is not re-built if one of
    the include files changes.  There is  no  edge  in  the  DAG
    joining  the  dependencies vertex to any of the include file
    vertices.  If an include file  changes  to  include  another
    file  (a  nested  include),  the  dependencies  will  not be



    Peter Miller            19 March 2005                Page 17





    AUUGN'97                   Recursive Make Considered Harmful


    recalculated, and potentially the C file will not be  recom-
    piled,  and thus the program will not be re-built correctly.

    A classic build-too-little problem, caused  by  giving  _m_a_k_e
    inadequate  information,  and  thus  causing  it to build an
    inadequate DAG and reach the wrong conclusion.

    The traditional solution is to build too much:

                    +--------------------------+
                    |SRC := $(wildcard *.c)    |
                    |OBJ := $(SRC:.c=.o)       |
                    |test: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |include dependencies      |
                    |.PHONY: dependencies      |
                    |dependencies: $(SRC)      |
                    |  depend.sh $(CFLAGS) \   |
                    |    $(SRC) > $@           |
                    +--------------------------+
    Now, even if  the  project  is  completely  up-do-date,  the
    dependencies will be re-built.  For a large project, this is
    very wasteful, and can be a major contributor to _m_a_k_e taking
    ``forever'' to work out that nothing needs to be done.

    There  is  a  second problem, and that is that if any _o_n_e of
    the C files changes, _a_l_l of the C files will  be  re-scanned
    for  include dependencies.  This is as inefficient as having
    a Makefile which reads

                    +--------------------------+
                    |prog: $(SRC)              |
                    |  $(CC) -o $@ $(SRC)      |
                    +--------------------------+
    What is needed, in exact analogy to the C case, is  to  have
    an  intermediate  form.  This is usually given a ``.d'' suf-
    fix.  By exploiting the fact that more than one file may  be
    named  in  an include line, there is no need to ``link'' all
    of the ``.d'' files together:

                  +------------------------------+
                  |SRC := $(wildcard *.c)        |
                  |OBJ := $(SRC:.c=.o)           |
                  |test: $(OBJ)                  |
                  |  $(CC) -o $@ $(OBJ)          |
                  |include $(OBJ:.o=.d)          |
                  |%.d: %.c                      |
                  |  depend.sh $(CFLAGS) $* > $@ |
                  +------------------------------+

    This has one more thing to fix:  just  as  the  object  (.o)
    files  depend  on the source files and the include files, so
    do the dependency (.d) files.




    Peter Miller            19 March 2005                Page 18





    AUUGN'97                   Recursive Make Considered Harmful


         _f_i_l_e.d _f_i_l_e.o: _f_i_l_e.c _i_n_c_l_u_d_e.h

    This means tinkering with the depend.sh script again:

                +-----------------------------------+
                |#!/bin/sh                          |
                |gcc -MM -MG "$@" |                 |
                |sed -e 's@^\(.*\)\.o:@\1.d \1.o:@' |
                +-----------------------------------+

    This method of determining include file dependencies results
    in  the  Makefile  including  more  files  than the original
    method, but opening files is less expensive than  rebuilding
    all  of  the dependencies every time.  Typically a developer
    will edit one or two files before re-building;  this  method
    will  rebuild  the  _e_x_a_c_t  dependency file affected (or more
    than one, if you edited an include file).  On balance,  this
    will use less CPU, and less time.

    In  the case of a build where nothing needs to be done, _m_a_k_e
    will actually do  nothing,  and  will  work  this  out  very
    quickly.

    However,  the above technique assumes your project fits eni-
    trely within the one directory.  For  large  projects,  this
    usually  isn't  the  case.   This  means  tinkering with the
    depend.sh script again:

           +---------------------------------------------+
           |#!/bin/sh                                    |
           |DIR="$1"                                     |
           |shift 1                                      |
           |case "$DIR" in                               |
           |"" | ".")                                    |
           |gcc -MM -MG "$@" |                           |
           |sed -e 's@^\(.*\)\.o:@\1.d \1.o:@'           |
           |;;                                           |
           |*)                                           |
           |gcc -MM -MG "$@" |                           |
           |sed -e "s@^\(.*\)\.o:@$DIR/\1.d $DIR/\1.o:@" |
           |;;                                           |
           |esac                                         |
           +---------------------------------------------+
    And the rule needs to change, too, to pass the directory  as
    the first argument, as the script expects.

            +-------------------------------------------+
            |%.d: %.c                                   |
            |  depend.sh `dirname $*` $(CFLAGS) $* > $@ |
            +-------------------------------------------+
    Note  that  the  .d  files will be relative to the top level
    directory.  Writing them so that they can be used  from  any
    level is possible, but beyond the scope of this paper.




    Peter Miller            19 March 2005                Page 19





    AUUGN'97                   Recursive Make Considered Harmful


    55..55..  MMuullttiipplliieerr

    All of the inefficiencies described in this section compound
    together.  If you do 100 Makefile interpretations, once  for
    each module, checking 1000 source files can take a very long
    time - if the interpretation requires complex processing  or
    performs  unnecessary  work, or both.  A whole project _m_a_k_e,
    on the other hand, only needs to interpret  a  single  Make-
    file.

    66..  PPrroojjeeccttss _v_e_r_s_u_s SSaanndd--bbooxxeess

    The  above discussion assumes that a project resides under a
    single directory tree, and this is often  the  ideal.   How-
    ever,  the realities of working with large software projects
    often lead to weird and wonderful  directory  structures  in
    order  to  have  developers working on different sections of
    the project without taking complete copies and thereby wast-
    ing precious disk space.

    It  is  possible to see the whole-project _m_a_k_e proposed here
    as impractical, because it does not match the evolved  meth-
    ods of your development process.

    The  whole-project _m_a_k_e proposed here does have an effect on
    development methods: it can give  you  cleaner  and  simpler
    build  environments  for  your  developers.  By using _m_a_k_e's
    VPATH feature, it is possible to copy only those  files  you
    need  to  edit  into  your private work area, often called a
    _s_a_n_d_-_b_o_x_.

    The simplest explanation of what VPATH does is  to  make  an
    analogy  with  the  include file search path specified using
    -I_p_a_t_h options to the  C  compiler.   This  set  of  options
    describes  where to look for files, just as VPATH tells _m_a_k_e
    where to look for files.

    By using VPATH, it is possible to ``stack'' the sand-box  _o_n
    _t_o_p _o_f the project master source, so that files in the sand-
    box take precedence, but it is the union of  all  the  files
    which _m_a_k_e uses to perform the build.
                      +          +
                     +_M+_a_s_t_e_r _S_o_u_r+_c+_e
                     +   main.c +   _C_o_m_b_i_n_e_d _V_i_e_w
                    +   parse.y+       main.c
                     _S_a_n_d_-_B_o_x    +     parse.y
                      main.c    ++   variable.c
                                +
                    variable.c +


    In  this  environment, the sand-box has the same tree struc-
    ture as the project master source.  This  allows  developers
    to  safely  change  things  across separate modules, _e_._g_. if



    Peter Miller            19 March 2005                Page 20





    AUUGN'97                   Recursive Make Considered Harmful


    they are changing a module interface.  It  also  allows  the
    sand-box  to be physically separate - perhaps on a different
    disk, or under their home directory.   It  also  allows  the
    project master source to be read-only, if you have (or would
    like) a rigorous check-in procedure.

    Note: in addition to adding a VPATH line to your development
    Makefile, you will also need to add -I options to the CFLAGS
    macro, so that the C compiler uses the  same  path  as  _m_a_k_e
    does.   This  is  simply done with a 3-line Makefile in your
    work area - set a macro, set the VPATH, and then include the
    Makefile from the project master source.

    66..11..  VVPPAATTHH SSeemmaannttiiccss

    For  the above discussion to apply, you need to use GNU make
    3.76 or later.  For versions of GNU Make earlier than  3.76,
    you  will  need  Paul  Smith's  VPATH+  patch.   This may be
    obtained from  ftp://ftp.wellfleet.com/netman/psmith/gmake/.

    The  POSIX  semantics  of  VPATH are slightly brain-dead, so
    many other _m_a_k_e implementations are too  limited.   You  may
    want to consider installing GNU Make.

    77..  TThhee BBiigg PPiiccttuurree

    This  section  brings  together all of the preceding discus-
    sion, and presents the example  project  with  its  separate
    modules,  but  with a whole-project Makefile.  The directory
    structure is changed little from the recursive case,  except
    that  the  deeper  Makefiles are replaced by module specific
    include files:
                          +++
                          ++-_P+_r+_o_j_e_c_t
                           ++++Maanktefile
                           |++-++module.mk
                           | +-++main.c
                           ++++b+e+e
                           | +-++module.mk
                           | +-++parse.y
                           +-++de|pend.sh
                              |

    The Makefile looks like this:

          +-----------------------------------------------+
          |MODULES := ant bee                             |
          |# look for include files in                    |
          |#   each of the modules                        |
          |CFLAGS += $(patsubst %,-I%,\                   |
          |  $(MODULES))                                  |
          |# extra libraries if required                  |
          |LIBS :=                                        |
          +-----------------------------------------------+



    Peter Miller            19 March 2005                Page 21





    AUUGN'97                   Recursive Make Considered Harmful


          +-----------------------------------------------+
          |# each module will add to this                 |
          |SRC :=                                         |
          |# include the description for                  |
          |#   each module                                |
          |include $(patsubst %,\                         |
          |    %/module.mk,$(MODULES))                    |
          |# determine the object files                   |
          |OBJ :=                    \                    |
          |  $(patsubst %.c,%.o,     \                    |
          |    $(filter %.c,$(SRC))) \                    |
          |  $(patsubst %.y,%.o,     \                    |
          |    $(filter %.y,$(SRC)))                      |
          |# link the program                             |
          |prog: $(OBJ)                                   |
          |  $(CC) -o $@ $(OBJ) $(LIBS)                   |
          |# include the C include                        |
          |#   dependencies                               |
          |include $(OBJ:.o=.d)                           |
          |# calculate C include                          |
          |#   dependencies                               |
          |%.d: %.c                                       |
          |  depend.sh `dirname $*.c` $(CFLAGS) $*.c > $@ |
          +-----------------------------------------------+
    This looks absurdly large, but it has all of the common ele-
    ments  in  the  one place, so that each of the modules' _m_a_k_e
    includes may be small.

    The ant/module.mk file looks like:

                    +--------------------------+
                    |SRC += ant/main.c         |
                    +--------------------------+
    The bee/module.mk file looks like:

                    +--------------------------+
                    |SRC += bee/parse.y        |
                    |LIBS += -ly               |
                    |%.c %.h: %.y              |
                    |  $(YACC) -d $*.y         |
                    |  mv y.tab.c $*.c         |
                    |  mv y.tab.h $*.h         |
                    +--------------------------+

    Notice that the built-in rules are used for the C files, but
    we  need  special  yacc  processing  to get the generated .h
    file.

    The savings in this example  look  irrelevant,  because  the
    top-level  Makefile is so large.  But consider if there were
    100 modules, each with only a  few  non-comment  lines,  and
    those specifically relevant to the module.  The savings soon
    add up to a total size often _l_e_s_s _t_h_a_n the  recursive  case,
    without loss of modularity.



    Peter Miller            19 March 2005                Page 22





    AUUGN'97                   Recursive Make Considered Harmful


    The equivalent DAG of the Makefile after all of the includes
    looks like this:
                     
                                prog



                          main.o   parse.o
                            main.d|  parse.d|
                                  |         |
                      main.c   parse.h  parse.c



                                   parse.y



    The vertexes and edges for the include file dependency files
    are also present as these are important for _m_a_k_e to function
    correctly.

    77..11..  SSiiddee EEffffeeccttss

    There are a couple of desirable side-effects of using a sin-
    gle Makefile.

    +o  The  GNU  Make -j option, for parallel builds, works even
    better than before.  It can find even more unrelated  things
    to do at once, and no longer has some subtle problems.

    +o The general make -k option, to continue as far as possible
    even in the face of errors, works even better  than  before.
    It can find even more things to continue with.

    88..  LLiitteerraattuurree SSuurrvveeyy

    How  can  it be possible that we have been misusing _m_a_k_e for
    20 years?  How can it be possible that  behavior  previously
    ascribed to _m_a_k_e's limitations is in fact a result of misus-
    ing it?

    The author only started thinking about the  ideas  presented
    in  this  paper when faced with a number of ugly build prob-
    lems on utterly different projects, but  with  common  symp-
    toms.   By  stepping  back from the individual projects, and
    closely examining the thing they had  in  common,  _m_a_k_e,  it
    became  possible  to see the larger pattern.  Most of us are
    too caught up in the minutiae of  just  getting  the  rotten
    build  to  work that we don't have time to spare for the big
    picture.  Especially when the item in question ``obviously''
    works, and has done so continuously for the last 20 years.





    Peter Miller            19 March 2005                Page 23





    AUUGN'97                   Recursive Make Considered Harmful


    It  is  interesting  that the problems of recursive _m_a_k_e are
    rarely mentioned in the very books Unix programmers rely  on
    for accurate, practical advice.

    88..11..  TThhee OOrriiggiinnaall PPaappeerr

    The  original  _m_a_k_e  paper [feld78] contains no reference to
    recursive _m_a_k_e_, let alone any discussion as to the  relative
    merits of whole project _m_a_k_e over recursive _m_a_k_e_.

    It is hardly surprising that the original paper did not dis-
    cuss recursive _m_a_k_e, Unix projects at the time  usually  _d_i_d
    fit into a single directory.

    It  may be this which set the ``one Makefile in every direc-
    tory'' concept so firmly in the collective Unix  development
    mind-set.

    88..22..  GGNNUU MMaakkee

    The GNU Make manual [stal93] contains several pages of mate-
    rial concerning recursive _m_a_k_e_, however  its  discussion  of
    the  merits or otherwise of the technique are limited to the
    brief statement that

         ``This technique is useful when you want to  sepa-
         rate makefiles for various subsystems that compose
         a larger system.''

    No mention is made of the problems you may encounter.

    88..33..  MMaannaaggiinngg PPrroojjeeccttss wwiitthh MMaakkee

    The Nutshell Make book [talb91] specifically promotes recur-
    sive _m_a_k_e over whole project _m_a_k_e because

         ``The  cleanest  way to build is to put a separate
         description file in each directory, and  tie  them
         together  through  a  master description file that
         invokes _m_a_k_e recursively.  While  cumbersome,  the
         technique  is  easier  to  maintain than a single,
         enormous file that covers multiple  directories.''
         (p. 65)

    This  is  despite the book's advice only two paragraphs ear-
    lier that

         ``_m_a_k_e is happiest when you keep all your files in
         a single directory.'' (p. 64)

    Yet the book fails to discuss the contradiction in these two
    statements, and goes on to describe one of  the  traditional
    ways  of  treating the symptoms of incomplete DAGs caused by
    recursive _m_a_k_e.



    Peter Miller            19 March 2005                Page 24





    AUUGN'97                   Recursive Make Considered Harmful


    The book may give us a clue as to  why  recursive  _m_a_k_e  has
    been  used  in  this  way for so many years.  Notice how the
    above quotes confuse the concept of  a  directory  with  the
    concept of a Makefile.

    This  paper suggests a simple change to the mind-set: direc-
    tory trees, however deep, are places to store  files;  Make-
    files are places to describe the relationships between those
    files, however many.

    88..44..  BBSSDD MMaakkee

    The tutorial for BSD Make [debo88] says nothing at all about
    recursive  _m_a_k_e,  but  it  is  one of the few which actually
    described, however briefly, the relationship between a Make-
    file and a DAG (p. 30).  There is also a wonderful quote

         ``If _m_a_k_e doesn't do what you expect it to, it's a
         good chance the makefile is wrong.'' (p. 10)

    Which is a pithy summary of the thesis of this paper.

    99..  SSuummmmaarryy

    This paper presents a number of related problems, and demon-
    strates  that  they are not inherent limitations of _m_a_k_e, as
    is commonly believed,  but  are  the  result  of  presenting
    incorrect  information to _m_a_k_e.  This is the ancient _G_a_r_b_a_g_e
    _I_n_, _G_a_r_b_a_g_e _O_u_t principle at work.  Because  _m_a_k_e  can  only
    operate  correctly with a complete DAG, the error is in seg-
    menting the Makefile into incomplete pieces.

    This requires a shift in thinking: directory _t_r_e_e_s are  sim-
    ply a place to hold files, Makefiles are a place to remember
    relationships between files.  Do not confuse the two because
    it is as important to accurately represent the relationships
    between files in different directories as it is to represent
    the relationships between files in the same directory.  This
    has the implication that there should be exactly  one  Make-
    file for a project, but the magnitude of the description can
    be managed by using a _m_a_k_e include file in each directory to
    describe  the subset of the project files in that directory.
    This is just as modular as having a Makefile in each  direc-
    tory.

    This  paper  has shown how a project build and a development
    build can be equally brief for a whole-project _m_a_k_e.   Given
    this  parity  of time, the gains provided by accurate depen-
    dencies mean that this process will, in fact, be faster than
    the recursive _m_a_k_e case, and more accurate.







    Peter Miller            19 March 2005                Page 25





    AUUGN'97                   Recursive Make Considered Harmful


    99..11..  IInntteerr--ddeeppeennddeenntt PPrroojjeeccttss

    In organizations with a strong culture of re-use, implement-
    ing whole-project _m_a_k_e can present  challenges.   Rising  to
    these challenges, however, may require looking at the bigger
    picture.

    +o A module may be shared between two  programs  because  the
      programs  are  closely related.  Clearly, the two programs
      plus the shared module belong to  the  same  project  (the
      module  may  be self-contained, but the programs are not).
      The dependencies must be explicitly stated, and changes to
      the  module  must result in both programs being recompiled
      and re-linked as appropriate.  Combining them all  into  a
      single  project  means  that whole-project _m_a_k_e can accom-
      plish this.

    +o A module may be shared between two projects  because  they
      must  inter-operate.  Possibly your project is bigger than
      your current directory structure implies.   The  dependen-
      cies  must be explicitly stated, and changes to the module
      must result in both  projects  being  recompiled  and  re-
      linked  as  appropriate.  Combining them all into a single
      project means that whole-project _m_a_k_e can accomplish this.

    +o It  is  the  normal  case  to  omit the edges between your
      project and the operating system or other installed  third
      party tools.  So normal that they are ignored in the Make-
      files in this paper, and they are ignored in the  built-in
      rules of _m_a_k_e programs.
      Modules shared between your projects may fall into a simi-
      lar category: if they change, you  will  deliberately  re-
      build  to  include their changes, or quietly include their
      changes whenever the next build  may  happen.   In  either
      case,  you  do  not explicitly state the dependencies, and
      whole-project _m_a_k_e does not apply.

    +o Re-use may be better served if the module were used  as  a
      template,  and  divergence between two projects is seen as
      normal.  Duplicating the module in each project allows the
      dependencies  to  be explicitly stated, but requires addi-
      tional effort if maintenance is  required  to  the  common
      portion.

    How to structure dependencies in a strong re-use environment
    thus becomes an exercise in _r_i_s_k _m_a_n_a_g_e_m_e_n_t.   What  is  the
    danger  that  omitting  chunks  of  the  DAG  will harm your
    projects?  How vital is it to rebuild if a  module  changes?
    What  are  the consequences of _n_o_t rebuilding automatically?
    How can you tell when a rebuild is necessary if  the  depen-
    dencies  are  not  explicitly  stated?   What are the conse-
    quences of forgetting to rebuild?





    Peter Miller            19 March 2005                Page 26





    AUUGN'97                   Recursive Make Considered Harmful


    99..22..  RReettuurrnn OOnn IInnvveessttmmeenntt

    Some of the techniques presented in this paper will  improve
    the speed of your builds, even if you continue to use recur-
    sive _m_a_k_e.  These are not the focus of this paper, merely  a
    useful detour.

    The  focus  of this paper is that you will get more accurate
    builds of your project if you use whole-project _m_a_k_e  rather
    than recursive _m_a_k_e.

    +o The  time  for  _m_a_k_e  to work out that nothing needs to be
      done will not be more, and will often be less.

    +o The size and complexity of the total Makefile  input  will
      not be more, and will often be less.

    +o The  total  Makefile  input is no less modular than in the
      resursive case.

    +o The difficulty of maintaining  the  total  Makefile  input
      will not be more, and will often be less.

    The disadvantages of using whole-project _m_a_k_e over recursive
    _m_a_k_e are often un-measured.  How much time is spent figuring
    out  why  _m_a_k_e  did  something unexpected?  How much time is
    spent figuring out that _m_a_k_e ddiidd something unexpected?   How
    much  time is spent tinkering with the build process?  These
    activities are often thought of  as  ``normal''  development
    overheads.

    Building  your  project is a fundamental activity.  If it is
    performing poorly, so are development, debugging  and  test-
    ing.  Building your project needs to be so simple the newest
    recruit can do it immediately with only  a  single  page  of
    instructions.   Building  your project needs to be so simple
    that it rarely needs any development effort at all.  Is your
    build process this simple?



















    Peter Miller            19 March 2005                Page 27





    AUUGN'97                   Recursive Make Considered Harmful


    1100..  RReeffeerreenncceess


         ddeebboo8888:: Adam de Boor (1988).  _P_M_a_k_e _- _A _T_u_t_o_r_i_a_l.  Uni-
    versity of California, Berkeley

         ffeelldd7788:: Stuart I. Feldman (1978).  _M_a_k_e _- _A _P_r_o_g_r_a_m _f_o_r
    _M_a_i_n_t_a_i_n_i_n_g  _C_o_m_p_u_t_e_r _P_r_o_g_r_a_m_s.  Bell Laboratories Computing
    Science Technical Report 57

         ssttaall9933:: Richard M. Stallman and Roland McGrath  (1993).
    _G_N_U _M_a_k_e_: _A _P_r_o_g_r_a_m _f_o_r _D_i_r_e_c_t_i_n_g _R_e_c_o_m_p_i_l_a_t_i_o_n.  Free Soft-
    ware Foundation, Inc.

         ttaallbb9911:: Steve Talbott (1991).  _M_a_n_a_g_i_n_g  _P_r_o_j_e_c_t_s  _w_i_t_h
    _M_a_k_e_, _2_n_d _E_d.  O'Reilly & Associates, Inc.

    1111..  AAbboouutt tthhee AAuutthhoorr

    Peter  Miller  has worked for many years in the software R&D
    industry, principally on UNIX systems. In that time  he  has
    written  tools  such as Aegis (a software configuration man-
    agement system) and Cook (yet  another  _m_a_k_e-oid),  both  of
    which  are freely available on the Internet.  Supporting the
    use of these tools  at  many  Internet  sites  provided  the
    insights which led to this paper.

    Please  visit  http://www.canb.auug.org.au/~millerp/  if you
    would like to look at some of the author's free software.




























    Peter Miller            19 March 2005                Page 28


