You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1130 lines
45 KiB

25 years ago
  1. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2. %%%
  3. %%% Version 12, 1999-09-07, Bruno
  4. %%% -----------------------------
  5. %%%
  6. %%% * Habe neue Referenzen zu den Karatsuba-Arbeiten hinzugef�gt, entnommen
  7. %%% dem gro�artigen Paper CECM-98-118 (98_118-Borwein-Bradley-Crandall.dvi).
  8. %%% * Habe in acm.bst, Funktion format.title, das change.case rausgenommen,
  9. %%% das die Namen von Euler, Riemann u.a. in Kleinbuchstaben konvertiert
  10. %%% hatte.
  11. %%%
  12. %%% Version 11, 1998-12-20, Bruno
  13. %%% -----------------------------
  14. %%%
  15. %%% * Habe Referenzen zu den Karatsuba-Arbeiten hinzugef�gt.
  16. %%%
  17. %%% Version 10, 1998-03-10, Thomas & Bruno
  18. %%% --------------------------------------
  19. %%%
  20. %%% * Korrigiere die Formel f�r a(n) bei zeta(3).
  21. %%% * Sch�nheitsfehler im Literaturverzeichnis.
  22. %%%
  23. %%% Version 9, 1998-03-06, Thomas & Bruno
  24. %%% -------------------------------------
  25. %%%
  26. %%% * Schreibe \frac{1}{|x|} statt \frac{1}{x}.
  27. %%%
  28. %%% Version 8, 1998-01-16b, Thomas
  29. %%% ------------------------------
  30. %%%
  31. %%% * Drei Literaturverweise f�r LiDIA statt nur einem.
  32. %%%
  33. %%% Version 7, 1998-01-16a, Bruno
  34. %%% -----------------------------
  35. %%%
  36. %%% * Adresse: Praefix F fuer Frankreich
  37. %%% * Abstract: Erwaehne zeta(3)
  38. %%% * Kleinere Korrekturen der O()-Abschaetzungen
  39. %%%
  40. %%% Version 6, 1998-01-14, Thomas
  41. %%% -----------------------------
  42. %%%
  43. %%% * habe meine Adresse ge"andert.
  44. %%%
  45. %%% * habe Resultat f"ur die Euler Konstante + Kettenbruch eingef"uhrt
  46. %%%
  47. %%% Version 5, 1997-12-11, Thomas
  48. %%% -----------------------------
  49. %%%
  50. %%% * Habe die Anzahl der Scritte bei der Euler Konstante von
  51. %%% x = ceiling(N log(2)/4)^2 auf x = ceiling((N+2) log(2)/4)^2
  52. %%% hochgesetzt (Mail von Sofroniou)
  53. %%%
  54. %%% * Habe Kommentar eingef"ugt bzgl Checkpointing bei der Euler
  55. %%% Konstante und Gamma(x) (Mail von Sofroniou)
  56. %%%
  57. %%% * Habe Section zu Geschwindigkeit von Maple gegen"uber Pari
  58. %%% verbessert (mail von Laurent Bernardi).
  59. %%%
  60. %%% Version 4, 1997-01-09, Thomas
  61. %%% -----------------------------
  62. %%%
  63. %%% * Habe die Komplexit�tsaussage f�r sinh, cosh erg�nzt.
  64. %%% * Habe die Versionen der getesteten CAS erg�nzt.
  65. %%%
  66. %%% Version 3, 1997-01-07, Bruno
  67. %%% ----------------------------
  68. %%%
  69. %%% * Meine Firma schreibt sich mit vier Grossbuchstaben.
  70. %%% * Apery schreibt sich m.W. mit einem Akzent.
  71. %%% * Die Fehlermeldung meldet n2-n1>0, nicht n1-n2>0.
  72. %%% * N -> \(N\) (zweimal)
  73. %%% * Leerzeile entfernt nach Display-Formeln, bei denen der Absatz
  74. %%% weitergeht. Hat den Effekt eines \noindent.
  75. %%% * Im Abschnitt: arctan(x) for rational x: "another way" -> "the fastest way"
  76. %%% * "[87]" -> "\cite{87}"
  77. %%% * Das Cohen-Villegas-Zagier brauchen wir nun doch nicht zu zitieren.
  78. %%% * Die "Note:" am Ende von Abschnitt ueber die Gamma-Funktion optisch
  79. %%% verkleinert.
  80. %%% * Die Formel fuer die hypergeometrische Funktionen optisch verschoenert.
  81. %%% Andere Formeln ebenso.
  82. %%% * Figure 2, erste Spalte rechtsbuendig.
  83. %%% * "out performs" -> "outperforms"
  84. %%% * "the streng" -> "the strength"
  85. %%% * Hinweis auf die Parallelisierbarkeit im Abstract.
  86. %%% * Bibtex-Style gehackt, damit nicht jeder zweite Autor auf seine
  87. %%% Anfangsbuchstaben verkuerzt und alleinstehende Autoren ihres
  88. %%% Vornamens beraubt werden.
  89. %%%
  90. %%% Version 2, 1997-01-06, Thomas
  91. %%% -----------------------------
  92. %%%
  93. %%% * ge�nderte Abschnitte sind auskommentiert mit %%%. Alle
  94. %%% �nderungen sind als Vorschlag zu verstehen. Der Grund
  95. %%% wird im folgenden angegeben. Falls Du mit einer �nderung
  96. %%% einverstanden bist, kannst Du alle %%%-Zeilen l�schen.
  97. %%%
  98. %%% * Lyx defines wurden entfernt. Grund: das ISSAC-acmconf.sty
  99. %%% erlaubt keine fremde macros. �bersetzung mit LaTeX geht.
  100. %%% * habe Keyboardumlaute (�,�,�,�) in LaTeX umgeschrieben.
  101. %%% Grund: damit die Submission in einem file geht.
  102. %%% * Habe fontenc und psfig usepackage Befehle entfernt.
  103. %%% Grund: fonts bestimmt acmconf.sty und keine Bilder vorhanden.
  104. %%% * Habe bibliography mit BibTeX (binsplit.bib) und acm.bst
  105. %%% erstellt. Grund: wird von ISSAC '97 verlangt.
  106. %%% * Habe langen Formeln in einer eqnarray Umgebung gesteckt.
  107. %%% Grund: acmconf.sty l�uft im twocolumn-Modus. Lange Formeln
  108. %%% haben die Ausgabe durcheinander gebracht.
  109. %%% * Habe Reihenfolge bei der Beschreibung der elementare
  110. %%% Funktionen ge�ndert, soda� zuerst die rationale und dann
  111. %%% die reelle version beschrieben wird. Grund: Einheitlichkeit.
  112. %%% * Habe sinh mit binary-splitting gegen sinh mit exp-Berechnung
  113. %%% getestet. Sie sind ungef�hr gleich gut auf meinen Pentium,
  114. %%% mir machen "Wackler" beim cosh. cosh ist ab und zu, sowohl
  115. %%% bei kleiner als auch bei gro�e Pr�zision langsamer. Habe
  116. %%% dies und dem Abschnitt sinh, cosh ausgef�llt. Grund: es hat
  117. %%% gefehlt.
  118. %%% * Habe artanh Abschnitt entfernt. Grund: ich habe in der
  119. %%% Einleitung der elementaren Funktionen darauf verwiesen, da�
  120. %%% man die Berechnung anderer Funktionen (wie artanh) auf die
  121. %%% hier erw�hnte zur�ckf�hren oder auf analoger Weise
  122. %%% implementieren kann. Ich denke man braucht nicht alles explizit
  123. %%% anzugeben.
  124. %%%
  125. %%% * Habe Dein Dankesch�n an mich entfernt.
  126. %%% * Habe Abschnitt �ber Konvergenzbeschleunigung entfernt.
  127. %%% Grund: das geht in Dein MathComp paper.
  128. %%%
  129. %%% * Habe neue Formel f�r pi eingef�gt. Grund: einfacher,
  130. %%% effizienter und stimmt mit der angegebenen Referenz
  131. %%% �berein.
  132. %%% * Habe die Berechnung der Apery Konstante angepasst.
  133. %%% Grund: die hier angegebenen Formel wurde mit einer
  134. %%% umgeformten Reihe berechnet. Wenn man dieses nicht
  135. %%% kennt, wirkt es verwirrend. Keine Effizienz-steigerung.
  136. %%% * Habe die Beschreibung f�r die erste version der Euler
  137. %%% Konstante entfernt. Grund: wird von der zweiten version
  138. %%% in jeder Hinsicht (Beweis, Effizienz) gedeckt.
  139. %%% * Habe Abschnitte �ber Checkpointing und Parallelisierung
  140. %%% eingef�gt. Ein Beispiel �ber die Wirksamkeit habe ich
  141. %%% bei der Apery Konstante angegeben. Grund: damit k�nnen wir
  142. %%% das Paper auch bei PASCO '97 einreichen.
  143. %%%
  144. %%% * Habe Beispiel-C++-Implementierung f�r abpq Reihen eingef�gt.
  145. %%% Grund: zeigen wie einfach es ist wenn man die Formeln hat ;-)
  146. %%% * Habe Beispiel-Implementierung f�r abpqcd Reihen eingef�gt.
  147. %%% Grund: dito
  148. %%% * Habe Computational results und Conclusions Abschnitt eingef�gt.
  149. %%% * Habe die Namen der Konstanten (C, G, ...) and die entsprechenden
  150. %%% Abschnitten eingef�gt. Grund: diese Namen werden bei den
  151. %%% Tabellen im Abschnitt Computational results benutzt.
  152. %%% * Habe Verweis an LiDIA eingef�gt. Grund: wird bei Computational
  153. %%% results erw\"ahnt.
  154. %%%
  155. %%% Version 1, 1996-11-30, Bruno
  156. %%%
  157. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  158. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  159. %%% Bruno Haible, Thomas Papanikolaou. %%%%
  160. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  161. \documentstyle{acmconf}
  162. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  163. % Plain TeX macros
  164. %\catcode`@=11 % @ ist ab jetzt ein gewoehnlicher Buchstabe
  165. %\def\@Re{\qopname@{Re}} \def\re#1{{\@Re #1}}
  166. %\def\@Im{\qopname@{Im}} \def\im#1{{\@Im #1}}
  167. %\catcode`@=12 % @ ist ab jetzt wieder ein Sonderzeichen
  168. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  169. % LaTeX2e macros
  170. \catcode`@=11 % @ ist ab jetzt ein gewoehnlicher Buchstabe
  171. \def\re{\mathop{\operator@font Re}\nolimits}
  172. \def\im{\mathop{\operator@font Im}\nolimits}
  173. \def\artanh{\mathop{\operator@font artanh}\nolimits}
  174. \catcode`@=12 % @ ist ab jetzt wieder ein Sonderzeichen
  175. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  176. \begin{document}
  177. \title{Fast multiprecision evaluation of series of rational numbers}
  178. \author{
  179. \begin{tabular}{ccc}
  180. {Bruno Haible} & \hspace*{2cm} & {Thomas Papanikolaou}\\
  181. {\normalsize ILOG} && {\normalsize Laboratoire A2X}\\
  182. {\normalsize 9, rue de Verdun} && {\normalsize 351, cours de la Lib\'eration}\\
  183. {\normalsize F -- 94253 Gentilly Cedex} && {\normalsize F -- 33405 Talence Cedex}\\
  184. {\normalsize {\tt haible@ilog.fr}} && {\normalsize {\tt papanik@math.u-bordeaux.fr}}\\
  185. \end{tabular}
  186. }
  187. \maketitle
  188. \begin{abstract}
  189. We describe two techniques for fast multiple-precision evaluation of linearly
  190. convergent series, including power series and Ramanujan series. The computation
  191. time for \(N\) bits is \( O((\log N)^{2}M(N)) \), where \( M(N) \) is the time
  192. needed to multiply two \(N\)-bit numbers. Applications include fast algorithms
  193. for elementary functions, \(\pi\), hypergeometric functions at rational points,
  194. $\zeta(3)$, Euler's, Catalan's and Ap{\'e}ry's constant. The algorithms are
  195. suitable for parallel computation.
  196. \end{abstract}
  197. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  198. \section{Introduction}
  199. Multiple-precision evaluation of real numbers has become efficiently possible
  200. since Sch\"onhage and Strassen \cite{71} have showed that the bit complexity of
  201. the multiplication of two \(N\)-bit numbers is
  202. \( M(N)=O(N\:\log N\:\log\log N) \).
  203. This is not only a theoretical result; a C++ implementation \cite{96a} can
  204. exploit this already for \( N=40000 \) bits. Algorithms for computing
  205. elementary functions (exp, log, sin, cos, tan, asin, acos, atan, sinh, cosh,
  206. tanh, arsinh, arcosh, artanh) have appeared in \cite{76b}, and a remarkable
  207. algorithm for \( \pi \) was found by Brent and Salamin \cite{76c}.
  208. However, all these algorithms suffer from the fact that calculated results
  209. are not reusable, since the computation is done using real arithmetic (using
  210. exact rational arithmetic would be extremely inefficient). Therefore functions
  211. or constants have to be recomputed from the scratch every time higher precision
  212. is required.
  213. In this note, we present algorithms for fast computation of sums of the form
  214. \[S=\sum _{n=0}^{\infty }R(n)F(0)\cdots F(n)\]
  215. where \( R(n) \) and \( F(n) \) are rational functions in \( n \) with rational
  216. coefficients, provided that this sum is linearly convergent, i.e. that the
  217. \( n \)-th term is \( O(c^{-n}) \) with \( c>1 \). Examples include elementary
  218. and hypergeometric functions at rational points in the {\em interior} of the
  219. circle of convergence, as well as \( \pi \) and Euler's, Catalan's and
  220. Ap{\'e}ry's constants.
  221. The presented algorithms are {\em easy to implement} and {\em extremely
  222. efficient}, since they take advantage of pure integer arithmetic. The
  223. calculated results are {\em exact}, making {\em checkpointing} and
  224. {\em reuse} of computations possible. Finally,
  225. the computation of our algorithms {\em can be easily parallelised}.
  226. After publishing the present paper, we were informed that the results of
  227. section 2 were already published by E.~Karatsuba in \cite{91,91b,93,95c}.
  228. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  229. \section{Evaluation of linearly convergent series}
  230. The technique presented here applies to all linearly convergent sums of the
  231. form
  232. \[ S=\sum ^{\infty }_{n=0}
  233. \frac{a(n)}{b(n)}\frac{p(0)\cdots p(n)}{q(0)\cdots q(n)}\]
  234. where \( a(n) \), \( b(n) \), \( p(n) \), \( q(n) \) are integers with
  235. \( O(\log n) \) bits. The most often used case is that \( a(n) \), \( b(n) \),
  236. \( p(n) \), \( q(n) \) are polynomials in \( n \) with integer coefficients.
  237. \begin{description}
  238. \item [Algorithm:]~
  239. \end{description}
  240. Given two index bounds \( n_{1} \) and \( n_{2} \), consider the partial sum
  241. \[
  242. S=\sum _{n_{1}\leq n<n_{2}}
  243. \frac{a(n)}{b(n)}\frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
  244. It is not computed directly. Instead, we compute the integers
  245. \( P={p(n_{1})}\cdots {p(n_{2}-1)} \), \( Q={q(n_{1})}\cdots {q(n_{2}-1)} \),
  246. \( B={b(n_{1})}\cdots {b(n_{2}-1)} \) and \( T=BQS \). If \( n_{2}-n_{1}<5 \),
  247. these are computed directly. If \( n_{2}-n_{1}\geq 5 \), they are computed
  248. using {\em binary splitting}: Choose an index \( n_{m} \) in the middle of
  249. \( n_{1} \) and \( n_{2} \), compute the components \( P_{l} \), \( Q_{l} \),
  250. \( B_{l} \), \( T_{l} \) belonging to the interval \( n_{1}\leq n<n_{m} \),
  251. compute the components \( P_{r} \), \( Q_{r} \), \( B_{r} \), \( T_{r} \)
  252. belonging to the interval \( n_{m}\leq n<n_{2} \), and set
  253. \( P=P_{l}P_{r} \), \( Q=Q_{l}Q_{r} \), \( B=B_{l}B_{r} \) and
  254. \( T=B_{r}Q_{r}T_{l}+B_{l}P_{l}T_{r} \).
  255. Finally, this algorithm is applied to \( n_{1}=0 \) and
  256. \( n_{2}=n_{\max }=O(N) \), and a final floating-point division
  257. \( S=\frac{T}{BQ} \) is performed.
  258. \begin{description}
  259. \item [Complexity:]~
  260. \end{description}
  261. The bit complexity of computing \( S \) with \( N \) bits of precision is
  262. \( O((\log N)^{2}M(N)) \).
  263. \begin{description}
  264. \item [Proof:]~
  265. \end{description}
  266. Since we have assumed the series to be linearly convergent, the \( n \)-th
  267. term is \( O(c^{-n}) \) with \( c>1 \). Hence choosing
  268. \( n_{\max }=N\frac{\log 2}{\log c}+O(1) \) will ensure that the round-off
  269. error is \( <2^{-N} \). By our assumption that \( a(n) \), \( b(n) \),
  270. \( p(n) \), \( q(n) \) are integers with \( O(\log n) \) bits, the integers
  271. \( P \), \( Q \), \( B \), \( T \) belonging to the interval
  272. \( n_{1}\leq n<n_{2} \) all have \( O((n_{2}-n_{1})\log n_{2}) \) bits.
  273. The algorithm's recursion depth is \( d=\frac{\log n_{\max }}{\log 2}+O(1) \).
  274. At recursion depth \( k \) (\( 1\leq k\leq d \)), integers having each
  275. \( O(\frac{n_{\max }}{2^{k}}\log n_{\max }) \) bits are multiplied. Thus,
  276. the entire computation time \( t \) is
  277. \begin{eqnarray*}
  278. t &=& \sum ^{d}_{k=1}
  279. 2^{k-1}O\left( M\left( \frac{n_{\max }}{2^{k}}\log n_{\max }\right)\right)\\
  280. &=& \sum ^{d}_{k=1} O\left( M\left( n_{\max }\log n_{\max }\right) \right)\\
  281. &=& O(\log n_{\max }M(n_{\max }\log n_{\max }))
  282. \end{eqnarray*}
  283. Because of \( n_{\max }=O( \frac{N}{\log c}) \) and
  284. \begin{eqnarray*}
  285. M\left(\frac{N}{\log c} \log \frac{N}{\log c}\right)
  286. &=& O\left(\frac{1}{\log c} N\, (\log N)^{2}\, \log \log N\right)\\
  287. &=& O\left(\frac{1}{\log c} \log N\, M(N)\right)
  288. \end{eqnarray*}
  289. we have
  290. \[ t = O\left(\frac{1}{\log c} (\log N)^{2}M(N)\right) \]
  291. Considering \(c\) as constant, this is the desired result.
  292. \begin{description}
  293. \item [Checkpointing/Parallelising:]~
  294. \end{description}
  295. A checkpoint can be easily done by storing the (integer) values of
  296. \( n_1 \), \( n_2 \), \( P \), \( Q \), \( B \) and \( T \).
  297. Similarly, if \( m \) processors are available, then the interval
  298. \( [0,n_{max}] \) can be divided into \( m \) pieces of length
  299. \( l = \lfloor n_{max}/m \rfloor \). After each processor \( i \) has
  300. computed the sum of its interval \( [il,(i+1)l] \), the partial sums are
  301. combined to the final result using the rules described above.
  302. \begin{description}
  303. \item [Note:]~
  304. \end{description}
  305. For the special case \( a(n)=b(n)=1 \), the binary splitting algorithm has
  306. already been documented in \cite{76a}, section 6, and \cite{87}, section 10.2.3.
  307. Explicit computation of \( P \), \( Q \), \( B \), \( T \) is only required
  308. as a recursion base, for \( n_{2}-n_{1}<2 \), but avoiding recursions for
  309. \( n_{2}-n_{1}<5 \) gains some percent of execution speed.
  310. The binary splitting algorithm is asymptotically faster than step-by-step
  311. evaluation of the sum -- which has binary complexity \( O(N^{2}) \) -- because
  312. it pushes as much multiplication work as possible to the region where
  313. multiplication becomes efficient. If the multiplication were implemented
  314. as an \( M(N)=O(N^{2}) \) algorithm, the binary splitting algorithm would
  315. provide no speedup over step-by-step evaluation.
  316. \begin{description}
  317. \item [Implementation:]~
  318. \end{description}
  319. In the following we present a simplified C++ implementation of
  320. the above algorithm\footnote{A complete implementation can be found in
  321. CLN \cite{96a}. The implementation of the binary-splitting method will
  322. be also available in {\sf LiDIA-1.4}}. The initialisation is done by a
  323. structure {\tt abpq\_series} containing arrays {\tt a}, {\tt b}, {\tt p}
  324. and {\tt q} of multiprecision integers ({\tt bigint}s). The values of
  325. the arrays at the index \( n \) correspond to the values of the functions
  326. \( a \), \( b \), \( p \) and \( q \) at the integer point \( n \).
  327. The (partial) results of the algorithm are stored in the
  328. {\tt abpq\_series\_result} structure.
  329. \begin{verbatim}
  330. // abpq_series is initialised by user
  331. struct { bigint *a, *b, *p, *q;
  332. } abpq_series;
  333. // abpq_series_result holds the partial results
  334. struct { bigint P, Q, B, T;
  335. } abpq_series_result;
  336. // binary splitting summation for abpq_series
  337. void sum_abpq(abpq_series_result & r,
  338. int n1, int n2,
  339. const abpq_series & arg)
  340. {
  341. // check the length of the summation interval
  342. switch (n2 - n1)
  343. {
  344. case 0:
  345. error_handler("summation device",
  346. "sum_abpq:: n2-n1 should be > 0.");
  347. break;
  348. case 1: // the result at the point n1
  349. r.P = arg.p[n1];
  350. r.Q = arg.q[n1];
  351. r.B = arg.b[n1];
  352. r.T = arg.a[n1] * arg.p[n1];
  353. break;
  354. // cases 2, 3, 4 left out for simplicity
  355. default: // the general case
  356. // the left and the right partial sum
  357. abpq_series_result L, R;
  358. // find the middle of the interval
  359. int nm = (n1 + n2) / 2;
  360. // sum left side
  361. sum_abpq(L, n1, nm, arg);
  362. // sum right side
  363. sum_abpq(R, nm, n2, arg);
  364. // put together
  365. r.P = L.P * R.P;
  366. r.Q = L.Q * R.Q;
  367. r.B = L.B * R.B;
  368. r.T = R.B * R.Q * L.T + L.B * L.P * R.T;
  369. break;
  370. }
  371. }
  372. \end{verbatim}
  373. Note that the multiprecision integers could be replaced here by integer
  374. polynomials, or by any other ring providing the operators \( = \) (assignment),
  375. \( + \) (addition) and \( * \) (multiplication). For example, one could regard
  376. a bivariate polynomial over the integers as a series over the second variable,
  377. with polynomials over the first variable as its coefficients. This would result
  378. an accelerated algorithm for summing bivariate (and thus multivariate)
  379. polynomials.
  380. \subsection{Example: The factorial}
  381. This is the most classical example of the binary splitting algorithm and was
  382. probably known long before \cite{87}.
  383. Computation of the factorial is best done using the binary splitting algorithm,
  384. combined with a reduction of the even factors into odd factors and
  385. multiplication with a power of 2, according to the formula
  386. \[
  387. n!=2^{n-\sigma _{2}(n)}\cdot \prod _{k\geq 1}
  388. \left( \prod _{\frac{n}{2^{k}}<2m+1\leq \frac{n}{2^{k-1}}}(2m+1)\right) ^{k}\]
  389. and where the products
  390. \[
  391. P(n_{1},n_{2})=\prod _{n_{1}<m\leq n_{2}}(2m+1)\]
  392. are evaluated according to the binary splitting algorithm:
  393. \( P(n_{1},n_{2})=P(n_{1},n_{m})P(n_{m},n_{2}) \) with
  394. \( n_{m}=\left\lfloor \frac{n_{1}+n_{2}}{2}\right\rfloor \)
  395. if \( n_{2}-n_{1}\geq 5 \).
  396. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  397. \subsection{Example: Elementary functions at rational points}
  398. The binary splitting algorithm can be applied to the fast computation of the
  399. elementary functions at rational points \( x=\frac{u}{v} \), simply
  400. by using the power series. We present how this can be done for
  401. \( \exp (x) \), \( \ln (x) \), \( \sin (x) \), \( \cos (x) \),
  402. \( \arctan (x) \), \( \sinh (x) \) and \( \cosh (x) \). The calculation
  403. of other elementary functions is similar (or it can be reduced to the
  404. calculation of these functions).
  405. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  406. \subsubsection{\( \exp (x) \) for rational \( x \)}
  407. This is a direct application of the above algorithm with \( a(n)=1 \),
  408. \( b(n)=1 \), \( p(0)=q(0)=1 \), and \( p(n)=u \), \( q(n)=nv \) for \( n>0 \).
  409. Because the series is not only linearly convergent -- \( \exp (x) \) is an
  410. entire function --, \( n_{\max }=O(\frac{N}{\log N + \log \frac{1}{|x|}}) \),
  411. hence the bit complexity is
  412. \[ O\left(\frac{(\log N)^2}{\log N + \log \frac{1}{|x|}} M(N)\right) \]
  413. Considering \(x\) as constant, this is \( O(\log N\: M(N)) \).
  414. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  415. \subsubsection{\( \exp (x) \) for real \( x \)}
  416. This can be computed using the addition theorem for exp, by a trick due to
  417. Brent \cite{76a} (see also \cite{87}, section 10.2, exercise 8). Write
  418. \[
  419. x=x_{0}+\sum _{k=0}^{\infty }\frac{u_{k}}{v_{k}}\]
  420. with \( x_{0} \) integer, \( v_{k}=2^{2^{k}} \) and
  421. \( |u_{k}|<2^{2^{k-1}} \), and compute
  422. \[
  423. \exp (x)=
  424. \exp (x_{0})\cdot \prod _{k\geq 0}\exp \left( \frac{u_{k}}{v_{k}}\right) \]
  425. This algorithm has bit complexity
  426. \[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
  427. = O((\log N)^{2}M(N)) \]
  428. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  429. \subsubsection{ \( \ln (x) \) for rational \( x \)}
  430. For rational \( |x-1|<1 \), the binary splitting algorithm can also be applied
  431. directly to the power series for \( \ln (x) \). Write \( x-1=\frac{u}{v} \)
  432. and compute the series with \( a(n)=1 \), \( b(n)=n+1 \), \( q(n)=v \),
  433. \( p(0)=u \), and \( p(n)=-u \) for \( n>0 \).
  434. This algorithm has bit complexity \( O((\log N)^{2}M(N)) \).
  435. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  436. \subsubsection{\( \ln (x) \) for real \( x \)}
  437. This can be computed using the ``inverse'' Brent trick:
  438. Start with \( y:=0 \).
  439. As long as \( x\neq 1 \) within the actual precision, choose \( k \)
  440. maximal with \( |x-1|<2^{-k} \). Put \( z=2^{-2k}\left[ 2^{2k}(x-1)\right] \),
  441. i.e. let \( z \) contain the first \( k \) significant bits of \( x-1 \).
  442. \( z \) is a good approximation for \( \ln (x) \). Set \( y:=y+z \) and
  443. \( x:=x\cdot \exp (-z) \).
  444. Since \( x\cdot \exp (y) \) is an invariant of the algorithm, the final
  445. \( y \) is the desired value \( \ln (x) \).
  446. This algorithm has bit complexity
  447. \[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
  448. = O((\log N)^{2}M(N)) \]
  449. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  450. \subsubsection{ \( \sin (x) \), \( \cos (x) \) for rational \( x \)}
  451. These are direct applications of the binary splitting algorithm: For
  452. \( \sin (x) \), put \( a(n)=1 \), \( b(n)=1 \), \( p(0)=u \),
  453. \( q(0)=v \), and \( p(n)=-u^{2} \), \( q(n)=(2n)(2n+1)v^{2} \) for
  454. \( n>0 \). For \( \cos (x) \), put \( a(n)=1 \), \( b(n)=1 \),
  455. \( p(0)=1 \), \( q(0)=1 \), and \( p(n)=-u^{2} \), \( q(n)=(2n-1)(2n)v^{2} \)
  456. for \( n>0 \). Of course, when both \( \sin (x) \) and \( \cos (x) \) are
  457. needed, one should only compute
  458. \( \sin (x) \) this way, and then set
  459. \( \cos (x)=\pm \sqrt{1-\sin (x)^{2}} \). This is a 20\% speedup at least.
  460. The bit complexity of these algorithms is \( O(\log N\: M(N)) \).
  461. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  462. \subsubsection{ \( \sin (x) \), \( \cos (x) \) for real \( x \)}
  463. To compute \( \cos (x)+i\sin (x)=\exp (ix) \) for real \( x \), again the
  464. addition theorems and Brent's trick
  465. can be used. The resulting algorithm has bit complexity
  466. \( O((\log N)^{2}M(N)) \).
  467. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  468. \subsubsection{ \( \arctan (x) \) for rational \( x \)}
  469. For rational \( |x|<1 \), the fastest way to compute \( \arctan (x) \) with
  470. bit complexity \( O((\log N)^{2}M(N)) \) is
  471. to apply the binary splitting algorithm directly to the power series
  472. for \( \arctan (x) \). Put \( a(n)=1 \), \( b(n)=2n+1 \), \( q(n)=1 \),
  473. \( p(0)=x \) and \( p(n)=-x^{2} \) for \( n>0 \).
  474. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  475. \subsubsection{ \( \arctan (x) \) for real \( x \)}
  476. This again can be computed using the ``inverse'' Brent trick:
  477. Start out with \( z:=\frac{1}{\sqrt{1+x^{2}}}+i\frac{x}{\sqrt{1+x^{2}}} \)
  478. and \( \varphi :=0 \). During the algorithm \( z \) will be a complex number
  479. with \( |z|=1 \) and \( \re (z)>0 \).
  480. As long as \( \im (z)\neq 0 \) within the actual precision, choose \( k \)
  481. maximal with \( |\im (z)|<2^{-k} \).
  482. Put \( \alpha =2^{-2k}\left[ 2^{2k}\im (z)\right] \), i.e. let \( \alpha \)
  483. contain the first \( k \) significant bits of \( \im (z) \). \( \alpha \)
  484. is a good approximation for \( \arcsin (\im (z)) \). Set
  485. \( \varphi :=\varphi +\alpha \) and \( z:=z\cdot \exp (-i\alpha ) \).
  486. Since \( z\cdot \exp (i\varphi ) \) is an invariant of the algorithm, the
  487. final \( \varphi \) is the desired
  488. value \( \arcsin \frac{x}{\sqrt{1+x^{2}}} \).
  489. This algorithm has bit complexity
  490. \[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
  491. = O((\log N)^{2}M(N)) \]
  492. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  493. \subsubsection{\( \sinh (x) \), \( \cosh (x) \) for rational and real \( x \)}
  494. These can be computed by similar algorithms as \( \sin (x) \) and
  495. \( \cos (x) \) above, with the same asymptotic bit complexity. The
  496. standard computation, using \( \exp (x) \) and its reciprocal (calculated
  497. by the Newton method) results also to the same complexity and works equally
  498. well in practice.
  499. The bit complexity of these algorithms is \( O(\log N\: M(N)) \) for rational
  500. \( x \) and \( O((\log N)^{2}M(N)) \) for real \( x \).
  501. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  502. \subsection{Example: Hypergeometric functions at rational points}
  503. The binary splitting algorithm is well suited for the evaluation of a
  504. hypergeometric series
  505. \[
  506. F\left( \begin{array}{ccc}
  507. a_{1}, & \ldots , & a_{r}\\
  508. b_{1}, & \ldots , & b_{s}
  509. \end{array}
  510. \big| x\right) =\sum ^{\infty }_{n=0}
  511. \frac{a_{1}^{\overline{n}}\cdots
  512. a_{r}^{\overline{n}}}{b_{1}^{\overline{n}}\cdots b_{s}^{\overline{n}}}x^{n}\]
  513. with rational coefficients \( a_{1} \), ..., \( a_{r} \), \( b_{1} \),
  514. ..., \( b_{s} \) at a rational point \( x \) in the interior of the circle of
  515. convergence. Just put \( a(n)=1 \), \( b(n)=1 \), \( p(0)=q(0)=1 \), and
  516. \( \frac{p(n)}{q(n)}=\frac{(a_{1}+n-1)\cdots
  517. (a_{r}+n-1)x}{(b_{1}+n-1)\cdots (b_{s}+n-1)} \) for \( n>0 \). The evaluation
  518. can thus be done with
  519. bit complexity \( O((\log N)^{2}M(N)) \) for
  520. \( r=s \) and \( O(\log N\: M(N)) \) for \( r<s \).
  521. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  522. \subsection{Example: \( \pi \)}
  523. The Ramanujan series for \( \pi \)
  524. \[
  525. \frac{1}{\pi }=\frac{12}{C^{3/2}}\sum^{\infty }_{n=0}
  526. \frac{(-1)^n(6n)!(A+nB)}{(3n)!n!^{3}C^{3n}}\]
  527. with \( A=13591409 \), \( B=545140134 \), \( C=640320 \) found by the
  528. Chudnovsky's \footnote{A special case of \cite{87}, formula (5.5.18),
  529. with N=163.} and which is used by the {\sf LiDIA} \cite{95,97,97b} and the Pari
  530. \cite{95b} system to compute \( \pi \), is usually written as an algorithm
  531. of bit complexity \( O(N^{2}) \). It is, however, possible to apply
  532. binary splitting to the sum. Put \( a(n)=A+nB \), \( b(n)=1 \),
  533. \( p(0)=1 \), \( q(0)=1 \), and \( p(n)=-(6n-5)(2n-1)(6n-1) \),
  534. \( q(n)=n^{3}C^3/24 \) for \( n>0 \). This reduces the complexity to
  535. \( O((\log N)^{2}M(N)) \). Although this is theoretically slower than
  536. Brent-Salamin's quadratically convergent iteration, which has a bit
  537. complexity of \( O(\log N\: M(N)) \), in practice the binary splitted
  538. Ramanujan sum is three times faster than Brent-Salamin, at least in the
  539. range from \( N=1000 \) bits to \( N=1000000 \) bits.
  540. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  541. %%% \subsection{Example: Catalan's constant}
  542. \subsection{Example: Catalan's constant \( G \)}
  543. A linearly convergent sum for Catalan's constant
  544. \[
  545. G:=\sum ^{\infty }_{n=0}\frac{(-1)^{n}}{(2n+1)^{2}}\]
  546. is given in \cite{87}, p. 386:
  547. \[
  548. G = \frac{3}{8}\sum ^{\infty }_{n=0}\frac{1}{{2n \choose n} (2n+1)^{2}}
  549. +\frac{\pi }{8}\log (2+\sqrt{3})
  550. \]
  551. The series is summed using binary splitting, putting \( a(n)=1 \),
  552. \( b(n)=2n+1 \), \( p(0)=1 \), \( q(0)=1 \), and
  553. \( p(n)=n \), \( q(n)=2(2n+1) \) for \( n>0 \). Thus
  554. \( G \) can be computed with bit complexity \( O((\log N)^{2}M(N)) \).
  555. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  556. \subsection{Example: The Gamma function at rational points}
  557. For evaluating \( \Gamma (s) \) for rational \( s \), we first reduce \( s \)
  558. to the range \( 1\leq s\leq 2 \) by the formula \( \Gamma (s+1)=s\Gamma (s) \).
  559. To compute \( \Gamma (s) \) with a precision of \( N \) bits, choose a
  560. positive integer \( x \) with \( xe^{-x}<2^{-N} \). Partial integration lets
  561. us write
  562. \begin{eqnarray*}
  563. \Gamma (s)&=& \int ^{\infty }_{0}e^{-t}t^{s-1}dt\\
  564. &=& x^{s}e^{-x}\:\sum ^{\infty }_{n=0}
  565. \frac{x^{n}}{s(s+1)\cdots (s+n)}
  566. +\int^{\infty }_{x}e^{-t}t^{s-1}dt\\
  567. \end{eqnarray*}
  568. The last integral is \( <xe^{-x}<2^{-N} \). The series is evaluated as a
  569. hypergeometric function (see above); the number of terms to be summed up is
  570. \( O(N) \), since \( x=O(N) \). Thus the entire computation can be done with
  571. bit complexity \( O((\log N)^{2}M(N)) \).
  572. \begin{description}
  573. \item [Note:]~
  574. \end{description}
  575. This result is already mentioned in \cite{76b}.
  576. E.~Karatsuba \cite{91} extends this result to \( \Gamma (s) \) for algebraic
  577. \( s \).
  578. For \( \Gamma (s) \) there is no checkpointing possible because of the
  579. dependency on \( x \) in the binary splitting.
  580. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  581. \subsection{Example: The Riemann Zeta value \( \zeta (3) \) \label{zeta}}
  582. Recently, Doron Zeilberger's method of ``creative telescoping'' has been
  583. applied to Riemann's zeta function at \( s=3 \) (see \cite{96c}), which is
  584. also known as {\em Ap{\'e}ry's constant}:
  585. \[
  586. \zeta (3)=
  587. \frac{1}{2}\sum ^{\infty }_{n=1}
  588. \frac{(-1)^{n-1}(205n^{2}-160n+32)}{n^{5}{2n \choose n}^{5}}
  589. \]
  590. This sum consists of three hypergeometric series. Binary splitting can also be
  591. applied directly, by putting \( a(n)=205n^{2}+250n+77 \), \( b(n)=1 \),
  592. \( p(0)=1 \), \( p(n)=-n^{5} \) for \( n>0 \), and \( q(n)=32(2n+1)^{5} \).
  593. Thus the bit complexity of computing \( \zeta (3) \) is
  594. \( O((\log N)^{2}M(N)) \).
  595. \begin{description}
  596. \item [Note:]~
  597. \end{description}
  598. Using this the authors were able to establish a new record in the
  599. calculation of \( \zeta (3) \) by computing 1,000,000 decimals \cite{96d}.
  600. The computation took 8 hours on a Hewlett Packard 9000/712 machine. After
  601. distributing on a cluster of 4 HP 9000/712 machines the same computation
  602. required only 2.5 hours. The half hour was necessary for reading the partial
  603. results from disk and for recombining them. Again, we have used binary-splitting
  604. for recombining: the 4 partial result produced 2 results which were combined
  605. to the final 1,000,000 decimals value of \( \zeta (3) \).
  606. This example shows the importance of checkpointing. Even if a machine crashes
  607. through the calculation, the results of the other machines are still usable.
  608. Additionally, being able to parallelise the computation reduced the computing
  609. time dramatically.
  610. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  611. \section{Evaluation of linearly convergent series of sums}
  612. The technique presented in the previous section also applies to all linearly
  613. convergent sums of the form
  614. \[
  615. U=\sum ^{\infty }_{n=0}\frac{a(n)}{b(n)}\left( \frac{c(0)}{d(0)}+\cdots
  616. +\frac{c(n)}{d(n)}\right) \frac{p(0)\cdots p(n)}{q(0)\cdots q(n)}\]
  617. where \( a(n) \), \( b(n) \), \( c(n) \), \( d(n) \), \( p(n) \),
  618. \( q(n) \) are integers with \( O(\log n) \) bits. The most often
  619. used case is again that \( a(n) \), \( b(n) \), \( c(n) \), \( d(n) \),
  620. \( p(n) \), \( q(n) \) are polynomials in \( n \) with
  621. integer coefficients.
  622. \begin{description}
  623. \item [Algorithm:]~
  624. \end{description}
  625. Given two index bounds \( n_{1} \)and \( n_{2} \), consider the partial sums
  626. \[
  627. S=\sum _{n_{1}\leq n<n_{2}}\frac{a(n)}{b(n)}
  628. \frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
  629. and
  630. \[
  631. U=\sum _{n_{1}\leq n<n_{2}}\frac{a(n)}{b(n)}
  632. \left( \frac{c(n_{1})}{d(n_{1})}+\cdots +\frac{c(n)}{d(n)}\right)
  633. \frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
  634. As above, we compute the integers \( P={p(n_{1})}\cdots {p(n_{2}-1)} \),
  635. \( Q={q(n_{1})}\cdots {q(n_{2}-1)} \), \( B={b(n_{1})}\cdots {b(n_{2}-1)} \),
  636. \( T=BQS \), \( D={d(n_{1})}\cdots {d(n_{2}-1)} \),
  637. \( C=D\left( \frac{c(n_{1})}{d(n_{1})}+\cdots +
  638. \frac{c(n_{2}-1)}{d(n_{2}-1)}\right) \) and \( V=DBQU \).
  639. If \( n_{2}-n_{1}<4 \), these
  640. are computed directly. If \( n_{2}-n_{1}\geq 4 \), they are computed using
  641. {\em binary splitting}: Choose an index \( n_{m} \) in the middle of
  642. \( n_{1} \)and \( n_{2} \), compute the components \( P_{l} \), \( Q_{l} \),
  643. \( B_{l} \), \( T_{l} \), \( D_{l} \), \( C_{l} \), \( V_{l} \) belonging
  644. to the interval \( n_{1}\leq n<n_{m} \), compute the components \( P_{r} \),
  645. \( Q_{r} \), \( B_{r} \), \( T_{r} \), \( D_{r} \), \( C_{r} \),
  646. \( V_{r} \) belonging to the interval \( n_{m}\leq n<n_{2} \), and set
  647. \( P=P_{l}P_{r} \), \( Q=Q_{l}Q_{r} \), \( B=B_{l}B_{r} \),
  648. \( T=B_{r}Q_{r}T_{l}+B_{l}P_{l}T_{r} \), \( D=D_{l}D_{r} \),
  649. \( C=C_{l}D_{r}+C_{r}D_{l} \) and
  650. \( V=D_{r}B_{r}Q_{r}V_{l}+D_{r}C_{l}B_{l}P_{l}T_{r}+D_{l}B_{l}P_{l}V_{r} \).
  651. Finally, this algorithm is applied to \( n_{1}=0 \) and
  652. \( n_{2}=n_{\max}=O(N) \), and final floating-point divisions
  653. \( S=\frac{T}{BQ} \) and \( U=\frac{V}{DBQ} \) are performed.
  654. \begin{description}
  655. \item [Complexity:]~
  656. \end{description}
  657. The bit complexity of computing \( S \) and \( U \) with \( N \) bits of
  658. precision is \( O((\log N)^{2}M(N)) \).
  659. \begin{description}
  660. \item [Proof:]~
  661. \end{description}
  662. By our assumption that \( a(n) \), \( b(n) \), \( c(n) \), \( d(n) \),
  663. \( p(n) \), \( q(n) \) are integers with \( O(\log n) \) bits,
  664. the integers \( P \), \( Q \), \( B \), \( T \), \( D \), \( C \),
  665. \( V \) belonging to the interval \( n_{1}\leq n<n_{2} \) all have
  666. \( O((n_{2}-n_{1})\log n_{2}) \) bits. The rest of the proof is as in the
  667. previous section.
  668. \begin{description}
  669. \item [Checkpointing/Parallelising:]~
  670. \end{description}
  671. A checkpoint can be easily done by storing the (integer) values of
  672. \( n_1 \), \( n_2 \), \( P \), \( Q \), \( B \), \( T \) and additionally
  673. \( D \), \( C \), \( V \). Similarly, if \( m \) processors are available,
  674. then the interval \( [0,n_{max}] \) can be divided into \( m \) pieces of
  675. length \( l = \lfloor n_{max}/m \rfloor \). After each processor \( i \) has
  676. computed the sum of its interval \( [il,(i+1)l] \), the partial sums are
  677. combined to the final result using the rules described above.
  678. \begin{description}
  679. \item [Implementation:]~
  680. \end{description}
  681. The C++ implementation of the above algorithm is very similar
  682. to the previous one. The initialisation is done now by a structure
  683. {\tt abpqcd\_series} containing arrays {\tt a}, {\tt b}, {\tt p},
  684. {\tt q}, {\tt c} and {\tt d} of multiprecision integers. The values of
  685. the arrays at the index \( n \) correspond to the values of the functions
  686. \( a \), \( b \), \( p \), \( q \), \( c \) and \( d \) at the integer point
  687. \( n \). The (partial) results of the algorithm are stored in the
  688. {\tt abpqcd\_series\_result} structure, which now contains 3 new elements
  689. ({\tt C}, {\tt D} and {\tt V}).
  690. \begin{verbatim}
  691. // abpqcd_series is initialised by user
  692. struct { bigint *a, *b, *p, *q, *c, *d;
  693. } abpqcd_series;
  694. // abpqcd_series_result holds the partial results
  695. struct { bigint P, Q, B, T, C, D, V;
  696. } abpqcd_series_result;
  697. void sum_abpqcd(abpqcd_series_result & r,
  698. int n1, int n2,
  699. const abpqcd_series & arg)
  700. {
  701. switch (n2 - n1)
  702. {
  703. case 0:
  704. error_handler("summation device",
  705. "sum_abpqcd:: n2-n1 should be > 0.");
  706. break;
  707. case 1: // the result at the point n1
  708. r.P = arg.p[n1];
  709. r.Q = arg.q[n1];
  710. r.B = arg.b[n1];
  711. r.T = arg.a[n1] * arg.p[n1];
  712. r.D = arg.d[n1];
  713. r.C = arg.c[n1];
  714. r.V = arg.a[n1] * arg.c[n1] * arg.p[n1];
  715. break;
  716. // cases 2, 3, 4 left out for simplicity
  717. default: // general case
  718. // the left and the right partial sum
  719. abpqcd_series_result L, R;
  720. // find the middle of the interval
  721. int nm = (n1 + n2) / 2;
  722. // sum left side
  723. sum_abpqcd(L, n1, nm, arg);
  724. // sum right side
  725. sum_abpqcd(R, nm, n2, arg);
  726. // put together
  727. r.P = L.P * R.P;
  728. r.Q = R.Q * L.Q;
  729. r.B = L.B * R.B;
  730. bigint tmp = L.B * L.P * R.T;
  731. r.T = R.B * R.Q * L.T + tmp;
  732. r.D = L.D * R.D;
  733. r.C = L.C * R.D + R.C * L.D;
  734. r.V = R.D * (R.B * R.Q * L.V + L.C * tmp)
  735. + L.D * L.B * L.P * R.V;
  736. break;
  737. }
  738. }
  739. \end{verbatim}
  740. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  741. \subsection{Example: Euler's constant \( C \) \label{eulergamma}}
  742. \begin{description}
  743. \item [Theorem:]~
  744. \end{description}
  745. Let \( f(x)=\sum ^{\infty }_{n=0}\frac{x^{n}}{n!^{2}} \) and
  746. \( g(x)=\sum ^{\infty }_{n=0}H_{n}\frac{x^{n}}{n!^{2}} \). Then for
  747. \( x\rightarrow \infty \),
  748. \( \frac{g(x)}{f(x)}=\frac{1}{2}\log x+C+O\left( e^{-4\sqrt{x}}\right) \).
  749. \begin{description}
  750. \item [Proof:]~
  751. \end{description}
  752. The Laplace method for asymptotic evaluation of exponentially growing
  753. sums and integrals yields
  754. \[
  755. f(x)=
  756. e^{2\sqrt{x}}x^{-\frac{1}{4}}\frac{1}{2\sqrt{\pi }}(1+O(x^{-\frac{1}{4}}))\]
  757. and
  758. \[
  759. g(x)=e^{2\sqrt{x}}x^{-\frac{1}{4}}\frac{1}{2\sqrt{\pi }}
  760. \left(\frac{1}{2}\log x+C+O(\log x\cdot x^{-\frac{1}{4}})\right)\]
  761. On the other hand, \( h(x):=\frac{g(x)}{f(x)} \) satisfies the
  762. differential equation
  763. \[
  764. xf(x)\cdot h''(x)+(2xf'(x)+f(x))\cdot h'(x)=f'(x)\]
  765. hence
  766. \[
  767. h(x)=\frac{1}{2}\log x+C+c_{2}
  768. \int ^{\infty }_{x}\frac{1}{tf(t)^{2}}dt=\frac{1}{2}\log x+C+O(e^{-4\sqrt{x}})\]
  769. \begin{description}
  770. \item [Algorithm:]~
  771. \end{description}
  772. To compute \( C \) with a precision of \( N \) bits, set
  773. \[ x=\left\lceil (N+2)\: \frac{\log 2}{4}\right\rceil ^{2} \]
  774. and evaluate the series for \( g(x) \) and \( f(x) \) simultaneously,
  775. using the binary-splitting algorithm,
  776. with \( a(n)=1 \), \( b(n)=1 \), \( c(n)=1 \), \( d(n)=n+1 \),
  777. \( p(n)=x \), \( q(n)=(n+1)^{2} \). Let \( \alpha =3.591121477\ldots \)
  778. be the solution of the equation \( -\alpha \log \alpha +\alpha +1=0 \). Then
  779. \( \alpha \sqrt{x}-\frac{1}{4\log \alpha }\log \sqrt{x}+O(1) \)
  780. terms of the series suffice for the relative error to be bounded
  781. by \( 2^{-N} \).
  782. \begin{description}
  783. \item [Complexity:]~
  784. \end{description}
  785. The bit complexity of this algorithm is \( O((\log N)^{2}M(N)) \).
  786. \begin{description}
  787. \item [Note:]~
  788. \end{description}
  789. This algorithm was first mentioned in \cite{80}. It is by far
  790. the fastest known algorithm for computing Euler's constant.
  791. For Euler's constant there is no checkpointing possible because
  792. of the dependency on \( x \) in the binary splitting.
  793. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  794. \section{Computational results}
  795. In this section we present some computational results of our CLN and
  796. {\sf LiDIA} implementation of the algorithms presented in this note. We use
  797. the official version (1.3) and an experimental version (1.4a) of {\sf LiDIA}.
  798. We have taken advantage of {\sf LiDIA}'s ability to replace its kernel
  799. (multiprecision arithmetic and memory management) \cite{95,97,97b}, so we were
  800. able to use in both cases CLN's fast integer arithmetic routines.
  801. \subsection{Timings}
  802. The table in Figure \ref{Fig1} shows the running times for the calculation of
  803. \( \exp(1) \), \( \log(2) \), \( \pi \), \( C \), \( G \) and \( \zeta(3) \)
  804. to precision 100, 1000, 10000 and 100000 decimal digits. The timings are given
  805. in seconds and they denote the {\em real} time needed, i.e. system and user
  806. time. The computation was done on an Intel Pentium with 133Hz and 32MB of RAM.
  807. \begin{figure}[htb]
  808. \begin{center}
  809. \begin{tabular}{|l|l|l|l|l|l|l|}
  810. \hline
  811. D &\( \exp(1) \)&\( \log(2) \)&\( \pi \)&\( C \) &\( G \)&\( \zeta(3) \)\\
  812. \hline
  813. \hline
  814. \( 10^2 \) &0.0005 & 0.0020 &0.0014 & 0.0309 &0.0179 & 0.0027 \\
  815. \hline
  816. \( 10^3 \) &0.0069 & 0.0474 &0.0141 & 0.8110 &0.3580 & 0.0696 \\
  817. \hline
  818. \( 10^4 \) &0.2566 & 1.9100 &0.6750 & 33.190 &13.370 & 2.5600 \\
  819. \hline
  820. \( 10^5 \) &5.5549 & 45.640 &17.430 & 784.93 &340.33 & 72.970 \\
  821. \hline
  822. \end{tabular}
  823. \caption{{\sf LiDIA-1.4a} timings of computation of constants using
  824. binary-splitting}\label{Fig1}
  825. \end{center}
  826. \end{figure}
  827. The second table (Figure \ref{Fig2}) summarizes the performance of
  828. \( exp(x) \) in various Computer Algebra systems\footnote{We do not list
  829. the timings of {\sf LiDIA-1.4a} since these are comparable to those of CLN.}.
  830. For a fair comparison of the algorithms, both argument and precision are
  831. chosen in such a way, that system--specific optimizations (BCD arithmetic
  832. in Maple, FFT multiplication in CLN, special exact argument handling in
  833. {\sf LiDIA}) do not work. We use \( x = -\sqrt{2} \) and precision
  834. \( 10^{(i/3)} \), with \( i \) running from \( 4 \) to \( 15 \).
  835. \begin{figure}[htb]
  836. \begin{center}
  837. \begin{tabular}{|r|l|l|l|l|}
  838. \hline
  839. D & Maple & Pari & {\sf LiDIA-1.3} & CLN \\
  840. \hline
  841. \hline
  842. 21 & 0.00090 & 0.00047 & 0.00191 & 0.00075 \\
  843. \hline
  844. 46 & 0.00250 & 0.00065 & 0.00239 & 0.00109 \\
  845. \hline
  846. 100 & 0.01000 & 0.00160 & 0.00389 & 0.00239 \\
  847. \hline
  848. 215 & 0.03100 & 0.00530 & 0.00750 & 0.00690 \\
  849. \hline
  850. 464 & 0.11000 & 0.02500 & 0.02050 & 0.02991 \\
  851. \hline
  852. 1000 & 0.4000 & 0.2940 & 0.0704 & 0.0861 \\
  853. \hline
  854. 2154 & 1.7190 & 0.8980 & 0.2990 & 0.2527 \\
  855. \hline
  856. 4641 & 8.121 & 5.941 & 1.510 & 0.906 \\
  857. \hline
  858. 10000 & 39.340 & 39.776 & 7.360 & 4.059 \\
  859. \hline
  860. 21544 & 172.499 & 280.207 & 39.900 & 15.010 \\
  861. \hline
  862. 46415 & 868.841 & 1972.184& 129.000 & 39.848 \\
  863. \hline
  864. 100000 & 4873.829 & 21369.197& 437.000 & 106.990 \\
  865. \hline
  866. \end{tabular}
  867. \caption{Timings of computation of \( \exp(-\sqrt{2}) \)}\label{Fig2}
  868. \end{center}
  869. \end{figure}
  870. MapleV R3 is the slowest system in this comparison. This is probably due to
  871. the BCD arithmetic it uses. However, Maple seems to have an asymptotically
  872. better algorithm for \( exp (x) \) for numbers having more than 10000 decimals.
  873. In this range it outperforms Pari-1.39.03, which is the fastest system in the
  874. 0--200 decimals range.
  875. The comparison indicating the strength of binary-splitting is between
  876. {\sf LiDIA-1.3} and CLN itself. Having the same kernel, the only
  877. difference is here that {\sf LiDIA-1.3} uses Brent's \( O(\sqrt{n}M(n)) \)
  878. for \( \exp(x) \), whereas CLN changes from Brent's method to a
  879. binary-splitting version for large numbers.
  880. As expected in the range of 1000--100000 decimals CLN outperforms
  881. {\sf LiDIA-1.3} by far. The fact that {\sf LiDIA-1.2.1} is faster
  882. in the range of 200--1000 decimals (also in some trig. functions)
  883. is probably due to a better optimized \( O(\sqrt{n}M(n)) \) method
  884. for \( \exp(x) \).
  885. \subsection {Distributed computing of \( \zeta (3) \)}
  886. Using the method described in \ref{zeta} the authors were the first to
  887. compute 1,000,000 decimals of \( \zeta (3) \) \cite{96d}.
  888. The computation took 8 hours on a Hewlett Packard 9000/712 machine. After
  889. distributing on a cluster of 4 HP 9000/712 machines the same computation
  890. required only 2.5 hours. The half hour was necessary for reading the partial
  891. results from disk and for recombining them. Again, we have used binary-splitting
  892. for recombining: the 4 partial result produced 2 results which were combined
  893. to the final 1,000,000 decimals value of \( \zeta (3) \).
  894. This example shows the importance of checkpointing. Even if a machine crashes
  895. through the calculation, the results of the other machines are still usable.
  896. Additionally, being able to parallelise the computation reduced the computing
  897. time dramatically.
  898. \subsection{Euler's constant \( C \)}
  899. We have implemented a version of Brent's and McMillan's algorithm \cite{80} and
  900. a version accelerated by binary-splitting as shown in \ref{eulergamma}.
  901. The computation of \( C \) was done twice on a SPARC-Ultra machine
  902. with 167 MHz and 256 MB of RAM. The first computation using the non-acellerated
  903. version required 160 hours. The result of this computation was then verified
  904. by the binary splitting version in (only) 14 hours.
  905. The first 475006 partial quotients of the continued fraction of \( C \)
  906. were computed on an Intel Pentium with 133 MHz and 32 MB of RAM in 3 hours
  907. using a programm by H. te Riele based on \cite{96e}, which was translated to
  908. {\sf LiDIA} for efficiency reasons. Computing the 475006th
  909. convergent produced the following improved theorem:
  910. \medskip
  911. \centerline{If \( C \) is a rational number, \(C=p/q\), then \( |q| > 10^{244663} \)}
  912. \medskip
  913. Details of this computation (including statistics on the partial
  914. quotients) can be found in \cite{98}.
  915. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  916. \section{Conclusions}
  917. Although powerful, the binary splitting method has not been widely used.
  918. Especially, no information existed on the applicability of this method.
  919. In this note we presented a generic binary-splitting summation device for
  920. evaluating two types of linearly convergent series. From this we derived simple
  921. and computationally efficient algorithms for the evaluation of elementary
  922. functions and constants. These algorithms work with {\em exact}
  923. objects, making them suitable for use within Computer Algebra systems.
  924. We have shown that the practical performance of our algorithms is
  925. superior to current system implementations. In addition to existing methods,
  926. our algorithms provide the possibility of checkpointing and parallelising.
  927. These features can be useful for huge calculations, such as those done in
  928. analytic number theory research.
  929. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  930. \section{Thanks}
  931. The authors would like to thank J\"org Arndt, for pointing us to
  932. chapter 10 in \cite{87}. We would also like to thank Richard P. Brent for
  933. his comments and Hermann te Riele for providing us his program for the
  934. continued fraction computation of Euler's constant.
  935. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  936. \bibliography{binsplit}
  937. \bibliographystyle{acm}
  938. \end{document}