You can not select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
					
					
						
							1130 lines
						
					
					
						
							45 KiB
						
					
					
				
			
		
		
		
			
			
			
				
					
				
				
					
				
			
		
		
	
	
							1130 lines
						
					
					
						
							45 KiB
						
					
					
				
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								%%%
							 | 
						|
								%%% Version 12, 1999-09-07, Bruno
							 | 
						|
								%%% -----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Habe neue Referenzen zu den Karatsuba-Arbeiten hinzugefügt, entnommen
							 | 
						|
								%%%   dem großartigen Paper CECM-98-118 (98_118-Borwein-Bradley-Crandall.dvi).
							 | 
						|
								%%% * Habe in acm.bst, Funktion format.title, das change.case rausgenommen,
							 | 
						|
								%%%   das die Namen von Euler, Riemann u.a. in Kleinbuchstaben konvertiert
							 | 
						|
								%%%   hatte.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 11, 1998-12-20, Bruno
							 | 
						|
								%%% -----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Habe Referenzen zu den Karatsuba-Arbeiten hinzugefügt.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 10, 1998-03-10, Thomas & Bruno
							 | 
						|
								%%% --------------------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Korrigiere die Formel für a(n) bei zeta(3).
							 | 
						|
								%%% * Schönheitsfehler im Literaturverzeichnis.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 9, 1998-03-06, Thomas & Bruno
							 | 
						|
								%%% -------------------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Schreibe \frac{1}{|x|} statt \frac{1}{x}.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 8, 1998-01-16b, Thomas
							 | 
						|
								%%% ------------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Drei Literaturverweise für LiDIA statt nur einem.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 7, 1998-01-16a, Bruno
							 | 
						|
								%%% -----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Adresse: Praefix F fuer Frankreich
							 | 
						|
								%%% * Abstract: Erwaehne zeta(3)
							 | 
						|
								%%% * Kleinere Korrekturen der O()-Abschaetzungen
							 | 
						|
								%%%
							 | 
						|
								%%% Version 6, 1998-01-14, Thomas
							 | 
						|
								%%% -----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * habe meine Adresse ge"andert.
							 | 
						|
								%%%
							 | 
						|
								%%% * habe Resultat f"ur die Euler Konstante + Kettenbruch eingef"uhrt
							 | 
						|
								%%%
							 | 
						|
								%%% Version 5, 1997-12-11, Thomas
							 | 
						|
								%%% -----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Habe die Anzahl der Scritte bei der Euler Konstante von
							 | 
						|
								%%%   x = ceiling(N log(2)/4)^2 auf x = ceiling((N+2) log(2)/4)^2
							 | 
						|
								%%%   hochgesetzt (Mail von Sofroniou) 
							 | 
						|
								%%%
							 | 
						|
								%%% * Habe Kommentar eingef"ugt bzgl Checkpointing bei der Euler
							 | 
						|
								%%%   Konstante und Gamma(x) (Mail von Sofroniou)
							 | 
						|
								%%%
							 | 
						|
								%%% * Habe Section zu Geschwindigkeit von Maple gegen"uber Pari
							 | 
						|
								%%%   verbessert (mail von Laurent Bernardi).
							 | 
						|
								%%%
							 | 
						|
								%%% Version 4, 1997-01-09, Thomas
							 | 
						|
								%%% -----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Habe die Komplexitätsaussage für sinh, cosh ergänzt.
							 | 
						|
								%%% * Habe die Versionen der getesteten CAS ergänzt.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 3, 1997-01-07, Bruno
							 | 
						|
								%%% ----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * Meine Firma schreibt sich mit vier Grossbuchstaben.
							 | 
						|
								%%% * Apery schreibt sich m.W. mit einem Akzent.
							 | 
						|
								%%% * Die Fehlermeldung meldet n2-n1>0, nicht n1-n2>0.
							 | 
						|
								%%% * N -> \(N\) (zweimal)
							 | 
						|
								%%% * Leerzeile entfernt nach Display-Formeln, bei denen der Absatz
							 | 
						|
								%%%   weitergeht. Hat den Effekt eines \noindent.
							 | 
						|
								%%% * Im Abschnitt: arctan(x) for rational x: "another way" -> "the fastest way"
							 | 
						|
								%%% * "[87]" -> "\cite{87}"
							 | 
						|
								%%% * Das Cohen-Villegas-Zagier brauchen wir nun doch nicht zu zitieren.
							 | 
						|
								%%% * Die "Note:" am Ende von Abschnitt ueber die Gamma-Funktion optisch
							 | 
						|
								%%%   verkleinert.
							 | 
						|
								%%% * Die Formel fuer die hypergeometrische Funktionen optisch verschoenert.
							 | 
						|
								%%%   Andere Formeln ebenso.
							 | 
						|
								%%% * Figure 2, erste Spalte rechtsbuendig.
							 | 
						|
								%%% * "out performs" -> "outperforms"
							 | 
						|
								%%% * "the streng" -> "the strength"
							 | 
						|
								%%% * Hinweis auf die Parallelisierbarkeit im Abstract.
							 | 
						|
								%%% * Bibtex-Style gehackt, damit nicht jeder zweite Autor auf seine
							 | 
						|
								%%%   Anfangsbuchstaben verkuerzt und alleinstehende Autoren ihres
							 | 
						|
								%%%   Vornamens beraubt werden.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 2, 1997-01-06, Thomas
							 | 
						|
								%%% -----------------------------
							 | 
						|
								%%%
							 | 
						|
								%%% * geänderte Abschnitte sind auskommentiert mit %%%. Alle
							 | 
						|
								%%%   Änderungen sind als Vorschlag zu verstehen. Der Grund
							 | 
						|
								%%%   wird im folgenden angegeben. Falls Du mit einer Änderung
							 | 
						|
								%%%   einverstanden bist, kannst Du alle %%%-Zeilen löschen.
							 | 
						|
								%%%   
							 | 
						|
								%%% * Lyx defines wurden entfernt. Grund: das ISSAC-acmconf.sty
							 | 
						|
								%%%   erlaubt keine fremde macros. Übersetzung mit LaTeX geht.
							 | 
						|
								%%% * habe Keyboardumlaute (ä,ü,ö,ß) in LaTeX umgeschrieben.
							 | 
						|
								%%%   Grund: damit die Submission in einem file geht.
							 | 
						|
								%%% * Habe fontenc und psfig usepackage Befehle entfernt. 
							 | 
						|
								%%%   Grund: fonts bestimmt acmconf.sty und keine Bilder vorhanden.
							 | 
						|
								%%% * Habe bibliography mit BibTeX (binsplit.bib) und acm.bst
							 | 
						|
								%%%   erstellt. Grund: wird von ISSAC '97 verlangt.
							 | 
						|
								%%% * Habe langen Formeln in einer eqnarray Umgebung gesteckt.
							 | 
						|
								%%%   Grund: acmconf.sty läuft im twocolumn-Modus. Lange Formeln
							 | 
						|
								%%%   haben die Ausgabe durcheinander gebracht.
							 | 
						|
								%%% * Habe Reihenfolge bei der Beschreibung der elementare
							 | 
						|
								%%%   Funktionen geändert, sodaß zuerst die rationale und dann
							 | 
						|
								%%%   die reelle version beschrieben wird. Grund: Einheitlichkeit.
							 | 
						|
								%%% * Habe sinh mit binary-splitting gegen sinh mit exp-Berechnung
							 | 
						|
								%%%   getestet. Sie sind ungefähr gleich gut auf meinen Pentium,
							 | 
						|
								%%%   mir machen "Wackler" beim cosh. cosh ist ab und zu, sowohl
							 | 
						|
								%%%   bei kleiner als auch bei große Präzision langsamer. Habe
							 | 
						|
								%%%   dies und dem Abschnitt sinh, cosh ausgefüllt. Grund: es hat
							 | 
						|
								%%%   gefehlt.
							 | 
						|
								%%% * Habe artanh Abschnitt entfernt. Grund: ich habe in der
							 | 
						|
								%%%   Einleitung der elementaren Funktionen darauf verwiesen, daß
							 | 
						|
								%%%   man die Berechnung anderer Funktionen (wie artanh) auf die
							 | 
						|
								%%%   hier erwähnte zurückführen oder auf analoger Weise
							 | 
						|
								%%%   implementieren kann. Ich denke man braucht nicht alles explizit 
							 | 
						|
								%%%   anzugeben.
							 | 
						|
								%%% 
							 | 
						|
								%%% * Habe Dein Dankeschön an mich entfernt.
							 | 
						|
								%%% * Habe Abschnitt über Konvergenzbeschleunigung entfernt.
							 | 
						|
								%%%   Grund: das geht in Dein MathComp paper.
							 | 
						|
								%%% 
							 | 
						|
								%%% * Habe neue Formel für pi eingefügt. Grund: einfacher,
							 | 
						|
								%%%   effizienter und stimmt mit der angegebenen Referenz
							 | 
						|
								%%%   überein.
							 | 
						|
								%%% * Habe die Berechnung der Apery Konstante angepasst.
							 | 
						|
								%%%   Grund: die hier angegebenen Formel wurde mit einer
							 | 
						|
								%%%   umgeformten Reihe berechnet. Wenn man dieses nicht
							 | 
						|
								%%%   kennt, wirkt es verwirrend. Keine Effizienz-steigerung.
							 | 
						|
								%%% * Habe die Beschreibung für die erste version der Euler
							 | 
						|
								%%%   Konstante entfernt. Grund: wird von der zweiten version
							 | 
						|
								%%%   in jeder Hinsicht (Beweis, Effizienz) gedeckt. 
							 | 
						|
								%%% * Habe Abschnitte über Checkpointing und Parallelisierung
							 | 
						|
								%%%   eingefügt. Ein Beispiel über die Wirksamkeit habe ich
							 | 
						|
								%%%   bei der Apery Konstante angegeben. Grund: damit können wir
							 | 
						|
								%%%   das Paper auch bei PASCO '97 einreichen. 
							 | 
						|
								%%%   
							 | 
						|
								%%% * Habe Beispiel-C++-Implementierung für abpq Reihen eingefügt.
							 | 
						|
								%%%   Grund: zeigen wie einfach es ist wenn man die Formeln hat ;-)
							 | 
						|
								%%% * Habe Beispiel-Implementierung für abpqcd Reihen eingefügt.
							 | 
						|
								%%%   Grund: dito
							 | 
						|
								%%% * Habe Computational results und Conclusions Abschnitt eingefügt.
							 | 
						|
								%%% * Habe die Namen der Konstanten (C, G, ...) and die entsprechenden
							 | 
						|
								%%%   Abschnitten eingefügt. Grund: diese Namen werden bei den
							 | 
						|
								%%%   Tabellen im Abschnitt Computational results benutzt.
							 | 
						|
								%%% * Habe Verweis an LiDIA eingefügt. Grund: wird bei Computational
							 | 
						|
								%%%   results erw\"ahnt.
							 | 
						|
								%%%
							 | 
						|
								%%% Version 1, 1996-11-30, Bruno
							 | 
						|
								%%% 
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								%%% Bruno Haible, Thomas Papanikolaou.                                     %%%%
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								
							 | 
						|
								\documentstyle{acmconf}
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								% Plain TeX macros
							 | 
						|
								%\catcode`@=11 % @ ist ab jetzt ein gewoehnlicher Buchstabe
							 | 
						|
								%\def\@Re{\qopname@{Re}}      \def\re#1{{\@Re #1}}
							 | 
						|
								%\def\@Im{\qopname@{Im}}      \def\im#1{{\@Im #1}}
							 | 
						|
								%\catcode`@=12 % @ ist ab jetzt wieder ein Sonderzeichen
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								% LaTeX2e macros
							 | 
						|
								\catcode`@=11 % @ ist ab jetzt ein gewoehnlicher Buchstabe
							 | 
						|
								\def\re{\mathop{\operator@font Re}\nolimits}
							 | 
						|
								\def\im{\mathop{\operator@font Im}\nolimits}
							 | 
						|
								\def\artanh{\mathop{\operator@font artanh}\nolimits}
							 | 
						|
								\catcode`@=12 % @ ist ab jetzt wieder ein Sonderzeichen
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\begin{document}
							 | 
						|
								
							 | 
						|
								\title{Fast multiprecision evaluation of series of rational numbers}
							 | 
						|
								
							 | 
						|
								\author{
							 | 
						|
								\begin{tabular}{ccc}
							 | 
						|
								{Bruno Haible} & \hspace*{2cm} & {Thomas Papanikolaou}\\
							 | 
						|
								{\normalsize ILOG} && {\normalsize Laboratoire A2X}\\
							 | 
						|
								{\normalsize 9, rue de Verdun} && {\normalsize 351, cours de la Lib\'eration}\\
							 | 
						|
								{\normalsize F -- 94253 Gentilly Cedex} && {\normalsize F -- 33405 Talence Cedex}\\
							 | 
						|
								{\normalsize {\tt haible@ilog.fr}} && {\normalsize {\tt papanik@math.u-bordeaux.fr}}\\
							 | 
						|
								\end{tabular}
							 | 
						|
								}
							 | 
						|
								
							 | 
						|
								\maketitle
							 | 
						|
								
							 | 
						|
								\begin{abstract}
							 | 
						|
								
							 | 
						|
								We describe two techniques for fast multiple-precision evaluation of linearly
							 | 
						|
								convergent series, including power series and Ramanujan series. The computation
							 | 
						|
								time for \(N\) bits is  \( O((\log N)^{2}M(N)) \), where \( M(N) \) is the time
							 | 
						|
								needed to multiply two \(N\)-bit numbers. Applications include fast algorithms
							 | 
						|
								for elementary functions,  \(\pi\), hypergeometric functions at rational points,
							 | 
						|
								$\zeta(3)$, Euler's, Catalan's and Ap{\'e}ry's constant. The algorithms are
							 | 
						|
								suitable for parallel computation.
							 | 
						|
								
							 | 
						|
								\end{abstract}
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\section{Introduction}
							 | 
						|
								
							 | 
						|
								Multiple-precision evaluation of real numbers has become efficiently possible
							 | 
						|
								since Sch\"onhage and Strassen \cite{71} have showed that the bit complexity of
							 | 
						|
								the multiplication of two \(N\)-bit numbers is
							 | 
						|
								\( M(N)=O(N\:\log N\:\log\log N) \).
							 | 
						|
								This is not only a theoretical result; a C++ implementation \cite{96a} can
							 | 
						|
								exploit this already for  \( N=40000 \) bits. Algorithms for computing
							 | 
						|
								elementary functions (exp, log, sin, cos, tan, asin, acos, atan, sinh, cosh,
							 | 
						|
								tanh, arsinh, arcosh, artanh) have appeared in \cite{76b}, and a remarkable
							 | 
						|
								algorithm for \( \pi \) was found by Brent and Salamin \cite{76c}.
							 | 
						|
								
							 | 
						|
								However, all these algorithms suffer from the fact that calculated results
							 | 
						|
								are not reusable, since the computation is done using real arithmetic (using
							 | 
						|
								exact rational arithmetic would be extremely inefficient). Therefore functions
							 | 
						|
								or constants have to be recomputed from the scratch every time higher precision
							 | 
						|
								is required.
							 | 
						|
								
							 | 
						|
								In this note, we present algorithms for fast computation of sums of the form
							 | 
						|
								
							 | 
						|
								\[S=\sum _{n=0}^{\infty }R(n)F(0)\cdots F(n)\]
							 | 
						|
								where \( R(n) \) and \( F(n) \) are rational functions in \( n \) with rational
							 | 
						|
								coefficients, provided that this sum is linearly convergent, i.e. that the
							 | 
						|
								\( n \)-th term is \( O(c^{-n}) \) with  \( c>1 \). Examples include elementary
							 | 
						|
								and hypergeometric functions at rational points in the {\em interior} of the
							 | 
						|
								circle of convergence, as well as \( \pi  \) and Euler's, Catalan's and
							 | 
						|
								Ap{\'e}ry's constants.
							 | 
						|
								
							 | 
						|
								The presented algorithms are {\em easy to implement} and {\em extremely
							 | 
						|
								efficient}, since they take advantage of pure integer arithmetic. The
							 | 
						|
								calculated results are {\em exact}, making {\em checkpointing} and
							 | 
						|
								{\em reuse} of computations possible. Finally,
							 | 
						|
								the computation of our algorithms {\em can be easily parallelised}.
							 | 
						|
								
							 | 
						|
								After publishing the present paper, we were informed that the results of
							 | 
						|
								section 2 were already published by E.~Karatsuba in \cite{91,91b,93,95c}.
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\section{Evaluation of linearly convergent series}
							 | 
						|
								
							 | 
						|
								The technique presented here applies to all linearly convergent sums of the
							 | 
						|
								form
							 | 
						|
								
							 | 
						|
								\[ S=\sum ^{\infty }_{n=0}
							 | 
						|
								\frac{a(n)}{b(n)}\frac{p(0)\cdots p(n)}{q(0)\cdots q(n)}\]
							 | 
						|
								where  \( a(n) \),  \( b(n) \),  \( p(n) \),  \( q(n) \) are integers with
							 | 
						|
								\( O(\log n) \) bits. The most often used case is that  \( a(n) \), \( b(n) \),
							 | 
						|
								\( p(n) \),  \( q(n) \) are polynomials in  \( n \) with integer coefficients.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Algorithm:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								Given two index bounds  \( n_{1} \) and  \( n_{2} \), consider the partial sum 
							 | 
						|
								
							 | 
						|
								\[
							 | 
						|
								S=\sum _{n_{1}\leq n<n_{2}}
							 | 
						|
								\frac{a(n)}{b(n)}\frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
							 | 
						|
								
							 | 
						|
								It is not computed directly. Instead, we compute the integers
							 | 
						|
								\( P={p(n_{1})}\cdots {p(n_{2}-1)} \),  \( Q={q(n_{1})}\cdots {q(n_{2}-1)} \),
							 | 
						|
								\( B={b(n_{1})}\cdots {b(n_{2}-1)} \) and  \( T=BQS \). If  \( n_{2}-n_{1}<5 \),
							 | 
						|
								these are computed directly.  If  \( n_{2}-n_{1}\geq 5 \), they are computed
							 | 
						|
								using {\em binary splitting}: Choose an index  \( n_{m} \) in the middle of
							 | 
						|
								\( n_{1} \) and \( n_{2} \), compute the components \( P_{l} \), \( Q_{l} \),
							 | 
						|
								\( B_{l} \), \( T_{l} \) belonging to the interval \( n_{1}\leq n<n_{m} \),
							 | 
						|
								compute the components  \( P_{r} \),  \( Q_{r} \),  \( B_{r} \),  \( T_{r} \)
							 | 
						|
								belonging to the interval  \( n_{m}\leq n<n_{2} \), and set 
							 | 
						|
								\( P=P_{l}P_{r} \),  \( Q=Q_{l}Q_{r} \),  \( B=B_{l}B_{r} \) and
							 | 
						|
								\( T=B_{r}Q_{r}T_{l}+B_{l}P_{l}T_{r} \).
							 | 
						|
								
							 | 
						|
								Finally, this algorithm is applied to  \( n_{1}=0 \) and
							 | 
						|
								\( n_{2}=n_{\max }=O(N) \), and a final floating-point division
							 | 
						|
								\( S=\frac{T}{BQ} \) is performed.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Complexity:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								The bit complexity of computing  \( S \) with  \( N \) bits of precision is 
							 | 
						|
								\( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Proof:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								Since we have assumed the series to be linearly convergent, the  \( n \)-th
							 | 
						|
								term is  \( O(c^{-n}) \) with  \( c>1 \). Hence choosing
							 | 
						|
								\( n_{\max }=N\frac{\log 2}{\log c}+O(1) \) will ensure that the round-off
							 | 
						|
								error is  \( <2^{-N} \). By our assumption that  \( a(n) \),  \( b(n) \),
							 | 
						|
								\( p(n) \),  \( q(n) \) are integers with  \( O(\log n) \) bits, the integers
							 | 
						|
								\( P \),  \( Q \),  \( B \),  \( T \) belonging to the interval
							 | 
						|
								\( n_{1}\leq n<n_{2} \) all have \( O((n_{2}-n_{1})\log n_{2}) \) bits.
							 | 
						|
								
							 | 
						|
								The algorithm's recursion depth is  \( d=\frac{\log n_{\max }}{\log 2}+O(1) \).
							 | 
						|
								At recursion depth  \( k \) (\( 1\leq k\leq d \)), integers having each
							 | 
						|
								\( O(\frac{n_{\max }}{2^{k}}\log n_{\max }) \) bits are multiplied. Thus,
							 | 
						|
								the entire computation time \( t \) is
							 | 
						|
								\begin{eqnarray*}
							 | 
						|
								t &=& \sum ^{d}_{k=1}
							 | 
						|
								2^{k-1}O\left( M\left( \frac{n_{\max }}{2^{k}}\log n_{\max }\right)\right)\\
							 | 
						|
								&=& \sum ^{d}_{k=1} O\left( M\left( n_{\max }\log n_{\max }\right) \right)\\
							 | 
						|
								&=& O(\log n_{\max }M(n_{\max }\log n_{\max }))
							 | 
						|
								\end{eqnarray*}
							 | 
						|
								Because of  \( n_{\max }=O( \frac{N}{\log c}) \) and
							 | 
						|
								\begin{eqnarray*}
							 | 
						|
								M\left(\frac{N}{\log c} \log \frac{N}{\log c}\right)
							 | 
						|
								&=& O\left(\frac{1}{\log c} N\, (\log N)^{2}\, \log \log N\right)\\
							 | 
						|
								&=& O\left(\frac{1}{\log c} \log N\, M(N)\right)
							 | 
						|
								\end{eqnarray*}
							 | 
						|
								we have
							 | 
						|
								\[ t = O\left(\frac{1}{\log c} (\log N)^{2}M(N)\right) \]
							 | 
						|
								Considering \(c\) as constant, this is the desired result.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Checkpointing/Parallelising:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								A checkpoint can be easily done by storing the (integer) values of
							 | 
						|
								\( n_1 \),  \( n_2 \), \( P \),  \( Q \),  \( B \) and \( T \).
							 | 
						|
								Similarly, if \( m \) processors are available, then the interval 
							 | 
						|
								\( [0,n_{max}] \) can be divided into \( m \) pieces of length 
							 | 
						|
								\( l = \lfloor n_{max}/m \rfloor \). After each processor \( i \) has
							 | 
						|
								computed the sum of its interval \( [il,(i+1)l] \), the partial sums are
							 | 
						|
								combined to the final result using the rules described above.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Note:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								For the special case  \( a(n)=b(n)=1 \), the binary splitting algorithm has
							 | 
						|
								already been documented in \cite{76a}, section 6, and \cite{87}, section 10.2.3.
							 | 
						|
								
							 | 
						|
								Explicit computation of  \( P \),  \( Q \),  \( B \),  \( T \) is only required
							 | 
						|
								as a recursion base, for  \( n_{2}-n_{1}<2 \), but avoiding recursions for
							 | 
						|
								\( n_{2}-n_{1}<5 \) gains some percent of execution speed.
							 | 
						|
								
							 | 
						|
								The binary splitting algorithm is asymptotically faster than step-by-step
							 | 
						|
								evaluation of the sum -- which has binary complexity  \( O(N^{2}) \) -- because
							 | 
						|
								it pushes as much multiplication work as possible to the region where
							 | 
						|
								multiplication becomes efficient. If the multiplication were implemented
							 | 
						|
								as an  \( M(N)=O(N^{2}) \) algorithm, the binary splitting algorithm would
							 | 
						|
								provide no speedup over step-by-step evaluation.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Implementation:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								In the following we present a simplified C++ implementation of
							 | 
						|
								the above algorithm\footnote{A complete implementation can be found in 
							 | 
						|
								CLN \cite{96a}. The implementation of the binary-splitting method will
							 | 
						|
								be also available in {\sf LiDIA-1.4}}. The initialisation is done by a
							 | 
						|
								structure {\tt abpq\_series} containing arrays {\tt a}, {\tt b}, {\tt p}
							 | 
						|
								and {\tt q} of multiprecision integers ({\tt bigint}s). The values of
							 | 
						|
								the arrays at the index \( n \) correspond to the values of the functions
							 | 
						|
								\( a \), \( b \), \( p \) and \( q \) at the integer point \( n \).
							 | 
						|
								The (partial) results of the algorithm are stored in the
							 | 
						|
								{\tt abpq\_series\_result} structure.
							 | 
						|
								
							 | 
						|
								\begin{verbatim}
							 | 
						|
								// abpq_series is initialised by user
							 | 
						|
								struct { bigint *a, *b, *p, *q; 
							 | 
						|
								       } abpq_series;
							 | 
						|
								
							 | 
						|
								// abpq_series_result holds the partial results
							 | 
						|
								struct { bigint P, Q, B, T; 
							 | 
						|
								       } abpq_series_result;
							 | 
						|
								
							 | 
						|
								// binary splitting summation for abpq_series
							 | 
						|
								void sum_abpq(abpq_series_result & r, 
							 | 
						|
								              int n1, int n2, 
							 | 
						|
								              const abpq_series & arg)
							 | 
						|
								{
							 | 
						|
								  // check the length of the summation interval
							 | 
						|
								  switch (n2 - n1)
							 | 
						|
								  {
							 | 
						|
								    case 0:
							 | 
						|
								      error_handler("summation device", 
							 | 
						|
								               "sum_abpq:: n2-n1 should be > 0.");
							 | 
						|
								      break;
							 | 
						|
								
							 | 
						|
								    case 1: // the result at the point n1
							 | 
						|
								      r.P = arg.p[n1];
							 | 
						|
								      r.Q = arg.q[n1];
							 | 
						|
								      r.B = arg.b[n1];
							 | 
						|
								      r.T = arg.a[n1] * arg.p[n1];
							 | 
						|
								      break;
							 | 
						|
								
							 | 
						|
								    // cases 2, 3, 4 left out for simplicity
							 | 
						|
								
							 | 
						|
								    default: // the general case
							 | 
						|
								
							 | 
						|
								      // the left and the right partial sum
							 | 
						|
								      abpq_series_result L, R;
							 | 
						|
								
							 | 
						|
								      // find the middle of the interval
							 | 
						|
								      int nm = (n1 + n2) / 2;
							 | 
						|
								
							 | 
						|
								      // sum left side
							 | 
						|
								      sum_abpq(L, n1, nm, arg);
							 | 
						|
								
							 | 
						|
								      // sum right side
							 | 
						|
								      sum_abpq(R, nm, n2, arg);
							 | 
						|
								
							 | 
						|
								      // put together
							 | 
						|
								      r.P = L.P * R.P;
							 | 
						|
								      r.Q = L.Q * R.Q;
							 | 
						|
								      r.B = L.B * R.B;
							 | 
						|
								      r.T = R.B * R.Q * L.T + L.B * L.P * R.T;
							 | 
						|
								      break;
							 | 
						|
								  }
							 | 
						|
								}
							 | 
						|
								\end{verbatim}
							 | 
						|
								
							 | 
						|
								Note that the multiprecision integers could be replaced here by integer
							 | 
						|
								polynomials, or by any other ring providing the operators \( = \) (assignment),
							 | 
						|
								\( + \) (addition) and \( * \) (multiplication). For example, one could regard
							 | 
						|
								a bivariate polynomial over the integers as a series over the second variable,
							 | 
						|
								with polynomials over the first variable as its coefficients. This would result
							 | 
						|
								an accelerated algorithm for summing bivariate (and thus multivariate)
							 | 
						|
								polynomials.
							 | 
						|
								
							 | 
						|
								\subsection{Example: The factorial}
							 | 
						|
								
							 | 
						|
								This is the most classical example of the binary splitting algorithm and was
							 | 
						|
								probably known long before \cite{87}.
							 | 
						|
								
							 | 
						|
								Computation of the factorial is best done using the binary splitting algorithm,
							 | 
						|
								combined with a reduction of the even factors into odd factors and
							 | 
						|
								multiplication with a power of 2, according to the formula
							 | 
						|
								
							 | 
						|
								\[
							 | 
						|
								n!=2^{n-\sigma _{2}(n)}\cdot \prod _{k\geq 1}
							 | 
						|
								\left( \prod _{\frac{n}{2^{k}}<2m+1\leq \frac{n}{2^{k-1}}}(2m+1)\right) ^{k}\]
							 | 
						|
								and where the products 
							 | 
						|
								\[
							 | 
						|
								P(n_{1},n_{2})=\prod _{n_{1}<m\leq n_{2}}(2m+1)\]
							 | 
						|
								are evaluated according to the binary splitting algorithm:
							 | 
						|
								\( P(n_{1},n_{2})=P(n_{1},n_{m})P(n_{m},n_{2}) \) with  
							 | 
						|
								\( n_{m}=\left\lfloor \frac{n_{1}+n_{2}}{2}\right\rfloor  \) 
							 | 
						|
								if  \( n_{2}-n_{1}\geq 5 \).
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsection{Example: Elementary functions at rational points}
							 | 
						|
								
							 | 
						|
								The binary splitting algorithm can be applied to the fast computation of the
							 | 
						|
								elementary functions at rational points  \( x=\frac{u}{v} \), simply 
							 | 
						|
								by using the power series. We present how this can be done for
							 | 
						|
								\( \exp (x) \), \( \ln (x) \), \( \sin (x) \), \( \cos (x) \),
							 | 
						|
								\( \arctan (x) \), \( \sinh (x) \) and \( \cosh (x) \). The calculation
							 | 
						|
								of other elementary functions is similar (or it can be reduced to the
							 | 
						|
								calculation of these functions).
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{\( \exp (x) \) for rational \( x \)}
							 | 
						|
								
							 | 
						|
								This is a direct application of the above algorithm with  \( a(n)=1 \),
							 | 
						|
								\( b(n)=1 \), \( p(0)=q(0)=1 \), and \( p(n)=u \), \( q(n)=nv \) for \( n>0 \).
							 | 
						|
								Because the series is not only linearly convergent -- \( \exp (x) \) is an
							 | 
						|
								entire function --,  \( n_{\max }=O(\frac{N}{\log N + \log \frac{1}{|x|}}) \),
							 | 
						|
								hence the bit complexity is
							 | 
						|
								\[ O\left(\frac{(\log N)^2}{\log N + \log \frac{1}{|x|}} M(N)\right) \]
							 | 
						|
								Considering \(x\) as constant, this is  \( O(\log N\: M(N)) \).
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{\( \exp (x) \) for real \( x \)}
							 | 
						|
								
							 | 
						|
								This can be computed using the addition theorem for exp, by a trick due to
							 | 
						|
								Brent \cite{76a} (see also \cite{87}, section 10.2, exercise 8). Write
							 | 
						|
								
							 | 
						|
								\[
							 | 
						|
								x=x_{0}+\sum _{k=0}^{\infty }\frac{u_{k}}{v_{k}}\]
							 | 
						|
								with  \( x_{0} \) integer,  \( v_{k}=2^{2^{k}} \) and 
							 | 
						|
								\( |u_{k}|<2^{2^{k-1}} \), and compute 
							 | 
						|
								\[
							 | 
						|
								\exp (x)=
							 | 
						|
								\exp (x_{0})\cdot \prod _{k\geq 0}\exp \left( \frac{u_{k}}{v_{k}}\right) \]
							 | 
						|
								
							 | 
						|
								This algorithm has bit complexity
							 | 
						|
								\[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
							 | 
						|
								  = O((\log N)^{2}M(N)) \]
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{ \( \ln (x) \) for rational  \( x \)}
							 | 
						|
								
							 | 
						|
								For rational  \( |x-1|<1 \), the binary splitting algorithm can also be applied
							 | 
						|
								directly to the power series for  \( \ln (x) \). Write  \( x-1=\frac{u}{v} \)
							 | 
						|
								and compute the series with  \( a(n)=1 \),  \( b(n)=n+1 \),  \( q(n)=v \),
							 | 
						|
								\( p(0)=u \), and  \( p(n)=-u \) for  \( n>0 \).
							 | 
						|
								
							 | 
						|
								This algorithm has bit complexity  \( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{\( \ln (x) \) for real \( x \)}
							 | 
						|
								
							 | 
						|
								This can be computed using the ``inverse'' Brent trick:
							 | 
						|
								
							 | 
						|
								Start with  \( y:=0 \).
							 | 
						|
								
							 | 
						|
								As long as  \( x\neq 1 \) within the actual precision, choose  \( k \)
							 | 
						|
								maximal with  \( |x-1|<2^{-k} \). Put \( z=2^{-2k}\left[ 2^{2k}(x-1)\right]  \),
							 | 
						|
								i.e. let  \( z \) contain the first  \( k \) significant bits of  \( x-1 \).
							 | 
						|
								\( z \) is a good approximation for  \( \ln (x) \). Set  \( y:=y+z \) and
							 | 
						|
								\( x:=x\cdot \exp (-z) \).
							 | 
						|
								
							 | 
						|
								Since  \( x\cdot \exp (y) \) is an invariant of the algorithm, the final
							 | 
						|
								\( y \) is the desired value  \( \ln (x) \).
							 | 
						|
								
							 | 
						|
								This algorithm has bit complexity
							 | 
						|
								\[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
							 | 
						|
								  = O((\log N)^{2}M(N)) \]
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{ \( \sin (x) \),  \( \cos (x) \) for rational  \( x \)}
							 | 
						|
								
							 | 
						|
								These are direct applications of the binary splitting algorithm: For
							 | 
						|
								\( \sin (x) \), put  \( a(n)=1 \),  \( b(n)=1 \),  \( p(0)=u \), 
							 | 
						|
								\( q(0)=v \), and  \( p(n)=-u^{2} \),  \( q(n)=(2n)(2n+1)v^{2} \) for
							 | 
						|
								\( n>0 \). For  \( \cos (x) \), put  \( a(n)=1 \),  \( b(n)=1 \),  
							 | 
						|
								\( p(0)=1 \),  \( q(0)=1 \), and  \( p(n)=-u^{2} \),  \( q(n)=(2n-1)(2n)v^{2} \)
							 | 
						|
								for  \( n>0 \). Of course, when both  \( \sin (x) \) and  \( \cos (x) \) are 
							 | 
						|
								needed, one should only compute
							 | 
						|
								 \( \sin (x) \) this way, and then set  
							 | 
						|
								\( \cos (x)=\pm \sqrt{1-\sin (x)^{2}} \). This is a 20\% speedup at least.
							 | 
						|
								
							 | 
						|
								The bit complexity of these algorithms is  \( O(\log N\: M(N)) \).
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{ \( \sin (x) \),  \( \cos (x) \) for real  \( x \)}
							 | 
						|
								
							 | 
						|
								To compute  \( \cos (x)+i\sin (x)=\exp (ix) \) for real  \( x \), again the 
							 | 
						|
								addition theorems and Brent's trick
							 | 
						|
								can be used. The resulting algorithm has bit complexity  
							 | 
						|
								\( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{ \( \arctan (x) \) for rational  \( x \)}
							 | 
						|
								
							 | 
						|
								For rational  \( |x|<1 \), the fastest way to compute  \( \arctan (x) \) with 
							 | 
						|
								bit complexity  \( O((\log N)^{2}M(N)) \) is
							 | 
						|
								to apply the binary splitting algorithm directly to the power series
							 | 
						|
								for  \( \arctan (x) \). Put  \( a(n)=1 \),  \( b(n)=2n+1 \),  \( q(n)=1 \),  
							 | 
						|
								\( p(0)=x \) and  \( p(n)=-x^{2} \) for  \( n>0 \).
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{ \( \arctan (x) \) for real  \( x \)}
							 | 
						|
								
							 | 
						|
								This again can be computed using the ``inverse'' Brent trick:
							 | 
						|
								
							 | 
						|
								Start out with  \( z:=\frac{1}{\sqrt{1+x^{2}}}+i\frac{x}{\sqrt{1+x^{2}}} \) 
							 | 
						|
								and  \( \varphi :=0 \). During the algorithm  \( z \) will be a complex number
							 | 
						|
								with  \( |z|=1 \) and  \( \re (z)>0 \).
							 | 
						|
								
							 | 
						|
								As long as  \( \im (z)\neq 0 \) within the actual precision, choose  \( k \) 
							 | 
						|
								maximal with  \( |\im (z)|<2^{-k} \).
							 | 
						|
								Put  \( \alpha =2^{-2k}\left[ 2^{2k}\im (z)\right]  \), i.e. let  \( \alpha  \) 
							 | 
						|
								contain the first  \( k \) significant bits of  \( \im (z) \).  \( \alpha  \) 
							 | 
						|
								is a good approximation for  \( \arcsin (\im (z)) \). Set  
							 | 
						|
								\( \varphi :=\varphi +\alpha  \) and  \( z:=z\cdot \exp (-i\alpha ) \).
							 | 
						|
								
							 | 
						|
								Since  \( z\cdot \exp (i\varphi ) \) is an invariant of the algorithm, the 
							 | 
						|
								final  \( \varphi  \) is the desired
							 | 
						|
								value  \( \arcsin \frac{x}{\sqrt{1+x^{2}}} \).
							 | 
						|
								
							 | 
						|
								This algorithm has bit complexity
							 | 
						|
								\[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
							 | 
						|
								  = O((\log N)^{2}M(N)) \]
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsubsection{\( \sinh (x) \), \( \cosh (x) \) for rational and real \( x \)}
							 | 
						|
								
							 | 
						|
								These can be computed by similar algorithms as  \( \sin (x) \) and  
							 | 
						|
								\( \cos (x) \) above, with the same asymptotic bit complexity. The
							 | 
						|
								standard computation, using \( \exp (x) \) and its reciprocal (calculated
							 | 
						|
								by the Newton method) results also to the same complexity and works equally
							 | 
						|
								well in practice.
							 | 
						|
								
							 | 
						|
								The bit complexity of these algorithms is  \( O(\log N\: M(N)) \) for rational
							 | 
						|
								\( x \) and \( O((\log N)^{2}M(N)) \) for real \( x \).
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsection{Example: Hypergeometric functions at rational points}
							 | 
						|
								
							 | 
						|
								The binary splitting algorithm is well suited for the evaluation of a
							 | 
						|
								hypergeometric series 
							 | 
						|
								
							 | 
						|
								\[
							 | 
						|
								F\left( \begin{array}{ccc}
							 | 
						|
								a_{1}, & \ldots , & a_{r}\\
							 | 
						|
								b_{1}, & \ldots , & b_{s}
							 | 
						|
								\end{array}
							 | 
						|
								\big| x\right) =\sum ^{\infty }_{n=0}
							 | 
						|
								\frac{a_{1}^{\overline{n}}\cdots 
							 | 
						|
								a_{r}^{\overline{n}}}{b_{1}^{\overline{n}}\cdots b_{s}^{\overline{n}}}x^{n}\]
							 | 
						|
								with rational coefficients  \( a_{1} \), ...,  \( a_{r} \),  \( b_{1} \),
							 | 
						|
								...,  \( b_{s} \) at a rational point  \( x \) in the interior of the circle of
							 | 
						|
								convergence. Just put  \( a(n)=1 \),  \( b(n)=1 \),  \( p(0)=q(0)=1 \), and
							 | 
						|
								\( \frac{p(n)}{q(n)}=\frac{(a_{1}+n-1)\cdots 
							 | 
						|
								(a_{r}+n-1)x}{(b_{1}+n-1)\cdots (b_{s}+n-1)} \) for  \( n>0 \). The evaluation 
							 | 
						|
								can thus be done with
							 | 
						|
								bit complexity  \( O((\log N)^{2}M(N)) \) for  
							 | 
						|
								\( r=s \) and  \( O(\log N\: M(N)) \) for  \( r<s \).
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsection{Example:  \( \pi  \)}
							 | 
						|
								
							 | 
						|
								The Ramanujan series for  \( \pi  \) 
							 | 
						|
								\[
							 | 
						|
								\frac{1}{\pi }=\frac{12}{C^{3/2}}\sum^{\infty }_{n=0}
							 | 
						|
								\frac{(-1)^n(6n)!(A+nB)}{(3n)!n!^{3}C^{3n}}\]
							 | 
						|
								with  \( A=13591409 \),  \( B=545140134 \), \( C=640320 \) found by the
							 | 
						|
								Chudnovsky's \footnote{A special case of \cite{87}, formula (5.5.18),
							 | 
						|
								with N=163.} and which is used by the {\sf LiDIA} \cite{95,97,97b} and the Pari
							 | 
						|
								\cite{95b} system to compute \( \pi \), is usually written as an algorithm
							 | 
						|
								of bit complexity  \( O(N^{2}) \). It is, however, possible to apply
							 | 
						|
								binary splitting to the sum. Put \( a(n)=A+nB \),  \( b(n)=1 \),
							 | 
						|
								\( p(0)=1 \),  \( q(0)=1 \), and \( p(n)=-(6n-5)(2n-1)(6n-1) \),
							 | 
						|
								\( q(n)=n^{3}C^3/24 \) for  \( n>0 \).  This reduces the complexity to 
							 | 
						|
								\( O((\log N)^{2}M(N)) \). Although this is theoretically slower than
							 | 
						|
								Brent-Salamin's quadratically convergent iteration, which has a bit
							 | 
						|
								complexity of  \( O(\log N\: M(N)) \), in practice the binary splitted
							 | 
						|
								Ramanujan sum is three times faster than Brent-Salamin, at least in the
							 | 
						|
								range from  \( N=1000 \) bits to \( N=1000000 \) bits.
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								%%% \subsection{Example: Catalan's constant}
							 | 
						|
								
							 | 
						|
								\subsection{Example: Catalan's constant \( G \)}
							 | 
						|
								
							 | 
						|
								A linearly convergent sum for Catalan's constant 
							 | 
						|
								\[
							 | 
						|
								G:=\sum ^{\infty }_{n=0}\frac{(-1)^{n}}{(2n+1)^{2}}\]
							 | 
						|
								is given in \cite{87}, p. 386:
							 | 
						|
								\[
							 | 
						|
								G = \frac{3}{8}\sum ^{\infty }_{n=0}\frac{1}{{2n \choose n} (2n+1)^{2}}
							 | 
						|
								    +\frac{\pi }{8}\log (2+\sqrt{3})
							 | 
						|
								\]
							 | 
						|
								
							 | 
						|
								The series is summed using binary splitting, putting \( a(n)=1 \),  
							 | 
						|
								\( b(n)=2n+1 \),  \( p(0)=1 \),  \( q(0)=1 \), and
							 | 
						|
								\( p(n)=n \),  \( q(n)=2(2n+1) \) for  \( n>0 \). Thus  
							 | 
						|
								\( G \) can be computed with bit complexity  \( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsection{Example: The Gamma function at rational points}
							 | 
						|
								
							 | 
						|
								For evaluating  \( \Gamma (s) \) for rational  \( s \), we first reduce \( s \)
							 | 
						|
								to the range \( 1\leq s\leq 2 \) by the formula \( \Gamma (s+1)=s\Gamma (s) \).
							 | 
						|
								To compute  \( \Gamma (s) \) with a precision of  \( N \) bits, choose a
							 | 
						|
								positive integer  \( x \) with  \( xe^{-x}<2^{-N} \). Partial integration lets
							 | 
						|
								us write 
							 | 
						|
								
							 | 
						|
								\begin{eqnarray*}
							 | 
						|
								\Gamma (s)&=& \int ^{\infty }_{0}e^{-t}t^{s-1}dt\\
							 | 
						|
								          &=& x^{s}e^{-x}\:\sum ^{\infty }_{n=0}
							 | 
						|
								             \frac{x^{n}}{s(s+1)\cdots (s+n)}
							 | 
						|
								             +\int^{\infty }_{x}e^{-t}t^{s-1}dt\\
							 | 
						|
								\end{eqnarray*}
							 | 
						|
								The last integral is  \( <xe^{-x}<2^{-N} \). The series is evaluated as a
							 | 
						|
								hypergeometric function (see above); the number of terms to be summed up is
							 | 
						|
								\( O(N) \), since \( x=O(N) \). Thus the entire computation can be done with
							 | 
						|
								bit complexity  \( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Note:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								This result is already mentioned in \cite{76b}.
							 | 
						|
								
							 | 
						|
								E.~Karatsuba \cite{91} extends this result to \( \Gamma (s) \) for algebraic
							 | 
						|
								\( s \).
							 | 
						|
								
							 | 
						|
								For \( \Gamma (s) \) there is no checkpointing possible because of the
							 | 
						|
								dependency on \( x \) in the binary splitting.
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsection{Example: The Riemann Zeta value \( \zeta (3) \) \label{zeta}}
							 | 
						|
								
							 | 
						|
								Recently, Doron Zeilberger's method of ``creative telescoping'' has been
							 | 
						|
								applied to Riemann's zeta function at  \( s=3 \) (see \cite{96c}), which is
							 | 
						|
								also known as {\em Ap{\'e}ry's constant}: 
							 | 
						|
								\[
							 | 
						|
								\zeta (3)=
							 | 
						|
								\frac{1}{2}\sum ^{\infty }_{n=1}
							 | 
						|
								               \frac{(-1)^{n-1}(205n^{2}-160n+32)}{n^{5}{2n \choose n}^{5}}
							 | 
						|
								\]
							 | 
						|
								
							 | 
						|
								This sum consists of three hypergeometric series. Binary splitting can also be
							 | 
						|
								applied directly, by putting  \( a(n)=205n^{2}+250n+77 \), \( b(n)=1 \),
							 | 
						|
								\( p(0)=1 \), \( p(n)=-n^{5} \) for  \( n>0 \), and \( q(n)=32(2n+1)^{5} \).
							 | 
						|
								Thus the bit complexity of computing \( \zeta (3) \) is
							 | 
						|
								\( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Note:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								Using this the authors were able to establish a new record in the
							 | 
						|
								calculation of \( \zeta (3) \) by computing 1,000,000 decimals \cite{96d}.
							 | 
						|
								The computation took 8 hours on a Hewlett Packard 9000/712 machine. After
							 | 
						|
								distributing on a cluster of 4 HP 9000/712 machines the same computation
							 | 
						|
								required only 2.5 hours. The half hour was necessary for reading the partial
							 | 
						|
								results from disk and for recombining them. Again, we have used binary-splitting
							 | 
						|
								for recombining: the 4 partial result produced 2 results which were combined
							 | 
						|
								to the final 1,000,000 decimals value of \( \zeta (3) \).
							 | 
						|
								
							 | 
						|
								This example shows the importance of checkpointing. Even if a machine crashes
							 | 
						|
								through the calculation, the results of the other machines are still usable.
							 | 
						|
								Additionally, being able to parallelise the computation reduced the computing
							 | 
						|
								time dramatically.
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\section{Evaluation of linearly convergent series of sums}
							 | 
						|
								
							 | 
						|
								The technique presented in the previous section also applies to all linearly
							 | 
						|
								convergent sums of the form
							 | 
						|
								
							 | 
						|
								\[
							 | 
						|
								U=\sum ^{\infty }_{n=0}\frac{a(n)}{b(n)}\left( \frac{c(0)}{d(0)}+\cdots 
							 | 
						|
								+\frac{c(n)}{d(n)}\right) \frac{p(0)\cdots p(n)}{q(0)\cdots q(n)}\]
							 | 
						|
								 where  \( a(n) \),  \( b(n) \),  \( c(n) \),  \( d(n) \),  \( p(n) \),  
							 | 
						|
								\( q(n) \) are integers with  \( O(\log n) \) bits. The most often
							 | 
						|
								used case is again that  \( a(n) \),  \( b(n) \),  \( c(n) \),  \( d(n) \),  
							 | 
						|
								\( p(n) \),  \( q(n) \) are polynomials in  \( n \) with
							 | 
						|
								integer coefficients.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Algorithm:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								Given two index bounds  \( n_{1} \)and  \( n_{2} \), consider the partial sums 
							 | 
						|
								\[
							 | 
						|
								S=\sum _{n_{1}\leq n<n_{2}}\frac{a(n)}{b(n)}
							 | 
						|
								\frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
							 | 
						|
								and 
							 | 
						|
								\[
							 | 
						|
								U=\sum _{n_{1}\leq n<n_{2}}\frac{a(n)}{b(n)}
							 | 
						|
								\left( \frac{c(n_{1})}{d(n_{1})}+\cdots +\frac{c(n)}{d(n)}\right) 
							 | 
						|
								\frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								As above, we compute the integers  \( P={p(n_{1})}\cdots {p(n_{2}-1)} \),  
							 | 
						|
								\( Q={q(n_{1})}\cdots {q(n_{2}-1)} \),  \( B={b(n_{1})}\cdots {b(n_{2}-1)} \),  
							 | 
						|
								\( T=BQS \),  \( D={d(n_{1})}\cdots {d(n_{2}-1)} \),  
							 | 
						|
								\( C=D\left( \frac{c(n_{1})}{d(n_{1})}+\cdots +
							 | 
						|
								\frac{c(n_{2}-1)}{d(n_{2}-1)}\right)  \) and  \( V=DBQU \). 
							 | 
						|
								If  \( n_{2}-n_{1}<4 \), these
							 | 
						|
								are computed directly. If  \( n_{2}-n_{1}\geq 4 \), they are computed using 
							 | 
						|
								{\em binary splitting}: Choose an index  \( n_{m} \) in the middle of
							 | 
						|
								\( n_{1} \)and  \( n_{2} \), compute the components \( P_{l} \),  \( Q_{l} \),
							 | 
						|
								\( B_{l} \),  \( T_{l} \),  \( D_{l} \),  \( C_{l} \),  \( V_{l} \) belonging
							 | 
						|
								to the interval  \( n_{1}\leq n<n_{m} \), compute the components \( P_{r} \),
							 | 
						|
								\( Q_{r} \),  \( B_{r} \),  \( T_{r} \),  \( D_{r} \),  \( C_{r} \),  
							 | 
						|
								\( V_{r} \) belonging to the interval  \( n_{m}\leq n<n_{2} \), and set  
							 | 
						|
								\( P=P_{l}P_{r} \),  \( Q=Q_{l}Q_{r} \),  \( B=B_{l}B_{r} \), 
							 | 
						|
								\( T=B_{r}Q_{r}T_{l}+B_{l}P_{l}T_{r} \),  \( D=D_{l}D_{r} \),  
							 | 
						|
								\( C=C_{l}D_{r}+C_{r}D_{l} \) and  
							 | 
						|
								\( V=D_{r}B_{r}Q_{r}V_{l}+D_{r}C_{l}B_{l}P_{l}T_{r}+D_{l}B_{l}P_{l}V_{r} \).
							 | 
						|
								
							 | 
						|
								Finally, this algorithm is applied to \( n_{1}=0 \) and 
							 | 
						|
								\( n_{2}=n_{\max}=O(N) \), and final floating-point divisions 
							 | 
						|
								\( S=\frac{T}{BQ} \) and  \( U=\frac{V}{DBQ} \) are performed.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Complexity:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								The bit complexity of computing  \( S \) and  \( U \) with  \( N \) bits of
							 | 
						|
								precision is \( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Proof:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								By our assumption that  \( a(n) \),  \( b(n) \),  \( c(n) \),  \( d(n) \),  
							 | 
						|
								\( p(n) \),  \( q(n) \) are integers with  \( O(\log n) \) bits,
							 | 
						|
								the integers  \( P \),  \( Q \),  \( B \),  \( T \),  \( D \),  \( C \),  
							 | 
						|
								\( V \) belonging to the interval  \( n_{1}\leq n<n_{2} \) all have
							 | 
						|
								 \( O((n_{2}-n_{1})\log n_{2}) \) bits. The rest of the proof is as in the 
							 | 
						|
								previous section.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Checkpointing/Parallelising:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								A checkpoint can be easily done by storing the (integer) values of
							 | 
						|
								\( n_1 \),  \( n_2 \), \( P \),  \( Q \),  \( B \), \( T \) and additionally
							 | 
						|
								\( D \),  \( C \),  \( V \). Similarly, if \( m \) processors are available,
							 | 
						|
								then the interval \( [0,n_{max}] \) can be divided into \( m \) pieces of
							 | 
						|
								length \( l = \lfloor n_{max}/m \rfloor \). After each processor \( i \) has
							 | 
						|
								computed the sum of its interval \( [il,(i+1)l] \), the partial sums are
							 | 
						|
								combined to the final result using the rules described above.
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Implementation:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								The C++ implementation of the above algorithm is very similar
							 | 
						|
								to the previous one. The initialisation is done now by a structure 
							 | 
						|
								{\tt abpqcd\_series} containing arrays {\tt a}, {\tt b}, {\tt p},
							 | 
						|
								{\tt q}, {\tt c} and {\tt d} of multiprecision integers. The values of
							 | 
						|
								the arrays at the index \( n \) correspond to the values of the functions
							 | 
						|
								\( a \), \( b \), \( p \), \( q \), \( c \) and \( d \) at the integer point
							 | 
						|
								\( n \). The (partial) results of the algorithm are stored in the
							 | 
						|
								{\tt abpqcd\_series\_result} structure, which now contains 3 new elements
							 | 
						|
								({\tt C}, {\tt D} and {\tt V}).
							 | 
						|
								
							 | 
						|
								\begin{verbatim}
							 | 
						|
								// abpqcd_series is initialised by user
							 | 
						|
								struct { bigint *a, *b, *p, *q, *c, *d; 
							 | 
						|
								       } abpqcd_series;
							 | 
						|
								
							 | 
						|
								// abpqcd_series_result holds the partial results
							 | 
						|
								struct { bigint P, Q, B, T, C, D, V; 
							 | 
						|
								       } abpqcd_series_result;
							 | 
						|
								
							 | 
						|
								void sum_abpqcd(abpqcd_series_result & r, 
							 | 
						|
								                int n1, int n2, 
							 | 
						|
								                const abpqcd_series & arg)
							 | 
						|
								{
							 | 
						|
								  switch (n2 - n1)
							 | 
						|
								  {
							 | 
						|
								    case 0:
							 | 
						|
								      error_handler("summation device", 
							 | 
						|
								            "sum_abpqcd:: n2-n1 should be > 0.");
							 | 
						|
								      break;
							 | 
						|
								
							 | 
						|
								    case 1: // the result at the point n1
							 | 
						|
								      r.P = arg.p[n1];
							 | 
						|
								      r.Q = arg.q[n1];
							 | 
						|
								      r.B = arg.b[n1];
							 | 
						|
								      r.T = arg.a[n1] * arg.p[n1];
							 | 
						|
								      r.D = arg.d[n1];
							 | 
						|
								      r.C = arg.c[n1];
							 | 
						|
								      r.V = arg.a[n1] * arg.c[n1] * arg.p[n1];
							 | 
						|
								      break;
							 | 
						|
								
							 | 
						|
								    // cases 2, 3, 4 left out for simplicity
							 | 
						|
								
							 | 
						|
								    default: // general case
							 | 
						|
								     
							 | 
						|
								      // the left and the right partial sum
							 | 
						|
								      abpqcd_series_result L, R;
							 | 
						|
								
							 | 
						|
								      // find the middle of the interval
							 | 
						|
								      int nm = (n1 + n2) / 2;
							 | 
						|
								 
							 | 
						|
								      // sum left side
							 | 
						|
								      sum_abpqcd(L, n1, nm, arg);
							 | 
						|
								
							 | 
						|
								      // sum right side
							 | 
						|
								      sum_abpqcd(R, nm, n2, arg);
							 | 
						|
								
							 | 
						|
								      // put together
							 | 
						|
								      r.P = L.P * R.P;
							 | 
						|
								      r.Q = R.Q * L.Q;
							 | 
						|
								      r.B = L.B * R.B;
							 | 
						|
								      bigint tmp = L.B * L.P * R.T;
							 | 
						|
								      r.T = R.B * R.Q * L.T + tmp;
							 | 
						|
								      r.D = L.D * R.D;
							 | 
						|
								      r.C = L.C * R.D + R.C * L.D;
							 | 
						|
								      r.V = R.D * (R.B * R.Q * L.V + L.C * tmp) 
							 | 
						|
								            + L.D * L.B * L.P * R.V;
							 | 
						|
								      break;
							 | 
						|
								  }
							 | 
						|
								}
							 | 
						|
								\end{verbatim}
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\subsection{Example: Euler's constant \( C \) \label{eulergamma}}
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Theorem:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								Let  \( f(x)=\sum ^{\infty }_{n=0}\frac{x^{n}}{n!^{2}} \) and  
							 | 
						|
								\( g(x)=\sum ^{\infty }_{n=0}H_{n}\frac{x^{n}}{n!^{2}} \). Then for  
							 | 
						|
								\( x\rightarrow \infty  \),  
							 | 
						|
								\( \frac{g(x)}{f(x)}=\frac{1}{2}\log x+C+O\left( e^{-4\sqrt{x}}\right)  \).
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Proof:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								The Laplace method for asymptotic evaluation of exponentially growing
							 | 
						|
								sums and integrals yields 
							 | 
						|
								\[
							 | 
						|
								f(x)=
							 | 
						|
								e^{2\sqrt{x}}x^{-\frac{1}{4}}\frac{1}{2\sqrt{\pi }}(1+O(x^{-\frac{1}{4}}))\]
							 | 
						|
								 and 
							 | 
						|
								\[
							 | 
						|
								g(x)=e^{2\sqrt{x}}x^{-\frac{1}{4}}\frac{1}{2\sqrt{\pi }}
							 | 
						|
								\left(\frac{1}{2}\log x+C+O(\log x\cdot x^{-\frac{1}{4}})\right)\]
							 | 
						|
								On the other hand,  \( h(x):=\frac{g(x)}{f(x)} \) satisfies the
							 | 
						|
								differential equation 
							 | 
						|
								\[
							 | 
						|
								xf(x)\cdot h''(x)+(2xf'(x)+f(x))\cdot h'(x)=f'(x)\]
							 | 
						|
								hence 
							 | 
						|
								\[
							 | 
						|
								h(x)=\frac{1}{2}\log x+C+c_{2}
							 | 
						|
								\int ^{\infty }_{x}\frac{1}{tf(t)^{2}}dt=\frac{1}{2}\log x+C+O(e^{-4\sqrt{x}})\]
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Algorithm:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								To compute  \( C \) with a precision of  \( N \) bits, set  
							 | 
						|
								\[ x=\left\lceil (N+2)\: \frac{\log 2}{4}\right\rceil ^{2} \]
							 | 
						|
								and evaluate the series for  \( g(x) \) and  \( f(x) \) simultaneously,
							 | 
						|
								using the binary-splitting algorithm,
							 | 
						|
								with  \( a(n)=1 \),  \( b(n)=1 \),  \( c(n)=1 \),  \( d(n)=n+1 \),  
							 | 
						|
								\( p(n)=x \),  \( q(n)=(n+1)^{2} \). Let  \( \alpha =3.591121477\ldots  \) 
							 | 
						|
								be the solution of the equation  \( -\alpha \log \alpha +\alpha +1=0 \). Then
							 | 
						|
								 \( \alpha \sqrt{x}-\frac{1}{4\log \alpha }\log \sqrt{x}+O(1) \) 
							 | 
						|
								terms of the series suffice for the relative error to be bounded
							 | 
						|
								by  \( 2^{-N} \).
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Complexity:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								The bit complexity of this algorithm is  \( O((\log N)^{2}M(N)) \).
							 | 
						|
								
							 | 
						|
								\begin{description}
							 | 
						|
								\item [Note:]~
							 | 
						|
								\end{description}
							 | 
						|
								
							 | 
						|
								This algorithm was first mentioned in \cite{80}. It is by far
							 | 
						|
								the fastest known algorithm for computing Euler's constant. 
							 | 
						|
								
							 | 
						|
								For Euler's constant there is no checkpointing possible because 
							 | 
						|
								of the dependency on \( x \) in the binary splitting.
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\section{Computational results}
							 | 
						|
								
							 | 
						|
								In this section we present some computational results of our CLN and
							 | 
						|
								{\sf LiDIA} implementation of the algorithms presented in this note. We use
							 | 
						|
								the official version (1.3) and an experimental version (1.4a) of {\sf LiDIA}.
							 | 
						|
								We have taken advantage of {\sf LiDIA}'s ability to replace its kernel
							 | 
						|
								(multiprecision arithmetic and memory management) \cite{95,97,97b}, so we were
							 | 
						|
								able to use in both cases CLN's fast integer arithmetic routines.
							 | 
						|
								
							 | 
						|
								\subsection{Timings}
							 | 
						|
								
							 | 
						|
								The table in Figure \ref{Fig1} shows the running times for the calculation of
							 | 
						|
								\( \exp(1) \), \( \log(2) \), \( \pi \), \( C \), \( G \) and \( \zeta(3) \)
							 | 
						|
								to precision 100, 1000, 10000 and 100000 decimal digits. The timings are given
							 | 
						|
								in seconds and they denote the {\em real} time needed, i.e. system and user
							 | 
						|
								time. The computation was done on an Intel Pentium with 133Hz and 32MB of RAM.
							 | 
						|
								
							 | 
						|
								\begin{figure}[htb]
							 | 
						|
								\begin{center}
							 | 
						|
								\begin{tabular}{|l|l|l|l|l|l|l|}
							 | 
						|
								\hline
							 | 
						|
								D      &\( \exp(1) \)&\( \log(2) \)&\( \pi \)&\( C \)  &\( G \)&\( \zeta(3) \)\\
							 | 
						|
								\hline
							 | 
						|
								\hline
							 | 
						|
								\( 10^2 \) &0.0005   & 0.0020      &0.0014   & 0.0309  &0.0179 & 0.0027 \\
							 | 
						|
								\hline
							 | 
						|
								\( 10^3 \) &0.0069   & 0.0474      &0.0141   & 0.8110  &0.3580 & 0.0696 \\
							 | 
						|
								\hline
							 | 
						|
								\( 10^4 \) &0.2566   & 1.9100      &0.6750   & 33.190  &13.370 & 2.5600 \\
							 | 
						|
								\hline
							 | 
						|
								\( 10^5 \) &5.5549   & 45.640      &17.430   & 784.93  &340.33 & 72.970 \\
							 | 
						|
								\hline
							 | 
						|
								\end{tabular}
							 | 
						|
								\caption{{\sf LiDIA-1.4a} timings of computation of constants using
							 | 
						|
								binary-splitting}\label{Fig1}
							 | 
						|
								\end{center}
							 | 
						|
								\end{figure}
							 | 
						|
								
							 | 
						|
								The second table (Figure \ref{Fig2}) summarizes the performance of
							 | 
						|
								\( exp(x) \) in various Computer Algebra systems\footnote{We do not list
							 | 
						|
								the timings of {\sf LiDIA-1.4a} since these are comparable to those of CLN.}.
							 | 
						|
								For a fair comparison of the algorithms, both argument and precision are
							 | 
						|
								chosen in such a way, that system--specific optimizations (BCD arithmetic
							 | 
						|
								in Maple, FFT multiplication in CLN, special exact argument handling in 
							 | 
						|
								{\sf LiDIA}) do not work. We use \( x = -\sqrt{2} \) and precision 
							 | 
						|
								\( 10^{(i/3)} \), with \( i \) running from \( 4 \) to \( 15 \).
							 | 
						|
								
							 | 
						|
								\begin{figure}[htb]
							 | 
						|
								\begin{center}
							 | 
						|
								\begin{tabular}{|r|l|l|l|l|}
							 | 
						|
								\hline
							 | 
						|
								D           &  Maple & Pari & {\sf LiDIA-1.3} & CLN \\
							 | 
						|
								\hline
							 | 
						|
								\hline
							 | 
						|
								21          & 0.00090   & 0.00047 & 0.00191           & 0.00075 \\
							 | 
						|
								\hline
							 | 
						|
								46          & 0.00250   & 0.00065 & 0.00239           & 0.00109 \\
							 | 
						|
								\hline
							 | 
						|
								100         & 0.01000   & 0.00160 & 0.00389           & 0.00239 \\
							 | 
						|
								\hline
							 | 
						|
								215         & 0.03100   & 0.00530 & 0.00750           & 0.00690 \\
							 | 
						|
								\hline
							 | 
						|
								464         & 0.11000   & 0.02500 & 0.02050           & 0.02991 \\
							 | 
						|
								\hline
							 | 
						|
								1000        & 0.4000    & 0.2940  & 0.0704            & 0.0861 \\
							 | 
						|
								\hline
							 | 
						|
								2154        & 1.7190    & 0.8980  & 0.2990            & 0.2527 \\
							 | 
						|
								\hline
							 | 
						|
								4641        & 8.121     & 5.941   & 1.510             & 0.906 \\
							 | 
						|
								\hline
							 | 
						|
								10000       & 39.340    & 39.776  & 7.360             & 4.059 \\
							 | 
						|
								\hline
							 | 
						|
								21544       & 172.499   & 280.207 & 39.900            & 15.010 \\
							 | 
						|
								\hline
							 | 
						|
								46415       & 868.841   & 1972.184& 129.000           & 39.848 \\
							 | 
						|
								\hline
							 | 
						|
								100000      & 4873.829  & 21369.197& 437.000           & 106.990 \\
							 | 
						|
								\hline
							 | 
						|
								\end{tabular}
							 | 
						|
								\caption{Timings of computation of \( \exp(-\sqrt{2}) \)}\label{Fig2}
							 | 
						|
								\end{center}
							 | 
						|
								\end{figure}
							 | 
						|
								
							 | 
						|
								MapleV R3 is the slowest system in this comparison. This is probably due to
							 | 
						|
								the BCD arithmetic it uses. However, Maple seems to have an asymptotically
							 | 
						|
								better algorithm for \( exp (x) \) for numbers having more than 10000 decimals.
							 | 
						|
								In this range it outperforms Pari-1.39.03, which is the fastest system in the
							 | 
						|
								0--200 decimals range.
							 | 
						|
								
							 | 
						|
								The comparison indicating the strength of binary-splitting is between
							 | 
						|
								{\sf LiDIA-1.3} and CLN itself. Having the same kernel, the only
							 | 
						|
								difference is here that {\sf LiDIA-1.3} uses Brent's \( O(\sqrt{n}M(n)) \)
							 | 
						|
								for \( \exp(x) \), whereas CLN changes from Brent's method to a
							 | 
						|
								binary-splitting version for large numbers.
							 | 
						|
								
							 | 
						|
								As expected in the range of 1000--100000 decimals CLN outperforms
							 | 
						|
								{\sf LiDIA-1.3} by far. The fact that {\sf LiDIA-1.2.1} is faster
							 | 
						|
								in the range of 200--1000 decimals (also in some trig. functions)
							 | 
						|
								is probably due to a better optimized \( O(\sqrt{n}M(n)) \) method
							 | 
						|
								for \( \exp(x) \).
							 | 
						|
								
							 | 
						|
								\subsection {Distributed computing of \( \zeta (3) \)}
							 | 
						|
								
							 | 
						|
								Using the method described in \ref{zeta} the authors were the first to 
							 | 
						|
								compute 1,000,000 decimals of \( \zeta (3) \) \cite{96d}.
							 | 
						|
								The computation took 8 hours on a Hewlett Packard 9000/712 machine. After
							 | 
						|
								distributing on a cluster of 4 HP 9000/712 machines the same computation
							 | 
						|
								required only 2.5 hours. The half hour was necessary for reading the partial
							 | 
						|
								results from disk and for recombining them. Again, we have used binary-splitting
							 | 
						|
								for recombining: the 4 partial result produced 2 results which were combined
							 | 
						|
								to the final 1,000,000 decimals value of \( \zeta (3) \).
							 | 
						|
								
							 | 
						|
								This example shows the importance of checkpointing. Even if a machine crashes
							 | 
						|
								through the calculation, the results of the other machines are still usable.
							 | 
						|
								Additionally, being able to parallelise the computation reduced the computing
							 | 
						|
								time dramatically.
							 | 
						|
								
							 | 
						|
								\subsection{Euler's constant \( C \)}
							 | 
						|
								
							 | 
						|
								We have implemented a version of Brent's and McMillan's algorithm \cite{80} and
							 | 
						|
								a version accelerated by binary-splitting as shown in \ref{eulergamma}.
							 | 
						|
								
							 | 
						|
								The computation of \( C \) was done twice on a SPARC-Ultra machine
							 | 
						|
								with 167 MHz and 256 MB of RAM. The first computation using the non-acellerated
							 | 
						|
								version required 160 hours. The result of this computation was then verified
							 | 
						|
								by the binary splitting version in (only) 14 hours.
							 | 
						|
								
							 | 
						|
								The first 475006 partial quotients of the continued fraction of \( C \)
							 | 
						|
								were computed on an Intel Pentium with 133 MHz and 32 MB of RAM in 3 hours
							 | 
						|
								using a programm by H. te Riele based on \cite{96e}, which was translated to
							 | 
						|
								{\sf LiDIA} for efficiency reasons. Computing the 475006th
							 | 
						|
								convergent produced the following improved theorem:
							 | 
						|
								
							 | 
						|
								\medskip
							 | 
						|
								
							 | 
						|
								\centerline{If \( C \) is a rational number, \(C=p/q\), then \( |q| > 10^{244663} \)}
							 | 
						|
								
							 | 
						|
								\medskip
							 | 
						|
								
							 | 
						|
								Details of this computation (including statistics on the partial 
							 | 
						|
								quotients) can be found in \cite{98}.
							 | 
						|
								
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\section{Conclusions}
							 | 
						|
								
							 | 
						|
								Although powerful, the binary splitting method has not been widely used.
							 | 
						|
								Especially, no information existed on the applicability of this method.
							 | 
						|
								
							 | 
						|
								In this note we presented a generic binary-splitting summation device for
							 | 
						|
								evaluating two types of linearly convergent series. From this we derived simple
							 | 
						|
								and computationally efficient algorithms for the evaluation of elementary
							 | 
						|
								functions and constants. These algorithms work with {\em exact}
							 | 
						|
								objects, making them suitable for use within Computer Algebra systems.
							 | 
						|
								
							 | 
						|
								We have shown that the practical performance of our algorithms is
							 | 
						|
								superior to current system implementations. In addition to existing methods,
							 | 
						|
								our algorithms provide the possibility of checkpointing and parallelising.
							 | 
						|
								These features can be useful for huge calculations, such as those done in
							 | 
						|
								analytic number theory research.
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\section{Thanks}
							 | 
						|
								
							 | 
						|
								The authors would like to thank J\"org Arndt, for pointing us to
							 | 
						|
								chapter 10 in \cite{87}. We would also like to thank Richard P. Brent for 
							 | 
						|
								his comments and Hermann te Riele for providing us his program for the
							 | 
						|
								continued fraction computation of Euler's constant.
							 | 
						|
								
							 | 
						|
								%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
							 | 
						|
								\bibliography{binsplit}
							 | 
						|
								\bibliographystyle{acm}
							 | 
						|
								
							 | 
						|
								\end{document}
							 |