Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!news.swapon.de!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: Mild Shock <janburse@fastmail.fm>
Newsgroups: comp.lang.prolog
Subject: Re: Higher Order Logic Programming and Autograd
Date: Tue, 11 Mar 2025 13:14:54 +0100
Message-ID: <vqp9fr$1bfso$1@solani.org>
References: <vqp8p5$1bfa2$1@solani.org> <vqp925$1bfht$1@solani.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 11 Mar 2025 12:14:51 -0000 (UTC)
Injection-Info: solani.org;
	logging-data="1425304"; mail-complaints-to="abuse@news.solani.org"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
 Firefox/128.0 SeaMonkey/2.53.20
Cancel-Lock: sha1:6PeF65QJJf8gxqBjdGtzBWDHZxc=
X-User-ID: eJwFwYEBwCAIA7CXJpWi5wDF/09Y4uBix6Zz+/PHsBVGWoK3oLHhQ/Rqp8p2fbgZglW6oeKMTuhosgXh+wE7gRWO
In-Reply-To: <vqp925$1bfht$1@solani.org>


But where is Autograd, automatic derivation from
some symbolic input? In general you can objectify
neural networks which I already did with the Prolog

list, and routines such as back/3 are pure Prolog.
Basically you could symbolically derive expit
(activation), mulderiv (the product with the derivative

of the activation) and matrran (the jacobian without
activation) from a DAG of vector functions. In a linear
neural network, the jacobian without activation is

the same as the weights, and expit has a simple derivative
that is based on the expit result itself which is
already stored as the activation:

/* g(x) = logistic function */
expit(X, Y) :- Y is 1/(1+exp(-X)).

/* g'(x) = g(x)*(1-g(x)) */
mulderiv(X, Y, Z) :- Z is X*Y*(1-Y).
See also:

A Gentle Introduction to torch.autograd
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

Mild Shock schrieb:
> What can we do with these new toys, we
> can implement vector operations and matrice
> operations. An then apply it for example
> 
> to layered neural networks by
> representing them as:
> 
> /**
>   * Network is represented as [N0,M1,N1,...,Mn,Nn]
>   * - Where N0 are the input neurons vector
>   * - Where N1 .. Nn-1 are the hidden neurons vectors
>   * - Where Nn are the output neurons vector
>   * . Where M1 .. Mn are the transition weights matrice
>   */
> 
> ?- mknet([3,2], X).
> X = [''(-1, 1, 1), ''(''(1, 1, -1), ''(1, 1, -1)), ''(-1, 1)].
> 
> The model evaluation at a data point
> is straight forward:
> 
> eval([V], [V]) :- !.
> eval([V,M,_|L], [V,M|R]) :- !,
>     matmul(M, V, H),
>     vecact(H, expit, J),
>     eval([J|L], R).
> 
> The backward calculation of deltas
> is straight forward:
> 
> back([V], U, [D]) :- !,
>     vecact(U, V, sub, E),
>     vecact(E, V, mulderiv, D).
> back([V,M,W|L], U, [D2,M,D|R])  :-
>     back([W|L], U, [D|R]),
>     mattran(M, M2),
>     matmul(M2, D, E),
>     vecact(E, V, mulderiv, D2).
> 
> You can use this to compute weight changes
> and drive a gradient algorithm.
> 
> Mild Shock schrieb:
>> Somehow I shied away from implementing call/n for
>> my new Prolog system. I thought my new Prolog system
>> has only monomorphic caches , I will never be able to
>>
>> replicate what I did for my old Prolog system with
>> arity polymorphic caches. This changed when I had
>> the idea to dynamically add a cache for the duration
>>
>> of a higher order loop such as maplist/n, foldl/n etc…
>>
>> So this is the new implementation of maplist/3:
>>
>> % maplist(+Closure, +List, -List)
>> maplist(C, L, R) :-
>>     sys_callable_cacheable(C, D),
>>     sys_maplist(L, D, R).
>>
>> % sys_maplist(+List, +Closure, -List)
>> sys_maplist([], _, []).
>> sys_maplist([X|L], C, [Y|R]) :-
>>     call(C, X, Y),
>>     sys_maplist(L, C, R).
>>
>> Its similar as the SWI-Prolog implementation in that
>> it reorders the arguments for better first argument
>> indexing. But the new thing is sys_callable_cacheable/1,
>>
>> which prepares the closure to be more efficiently
>> called. The invocation of the closure is already
>> quite fast since call/3 is implemented natively,
>>
>> but the cache adds an itch more speed. Here some
>> measurements that I did:
>>
>> /* SWI-Prolog 9.3.20 */
>> ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
>>     maplist(succ,L,_),fail; true)), fail.
>> % 2,003,000 inferences, 0.078 CPU in 0.094 seconds
>>
>> /* Scryer Prolog 0.9.4-350 */
>> ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
>>     maplist(succ,L,_),fail; true)), fail.
>>      % CPU time: 0.318s, 3_007_105 inferences
>>
>> /* Dogelog Player 1.3.1 */
>> ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
>>     maplist(succ,L,_),fail; true)), fail.
>> % Zeit 342 ms, GC 0 ms, Lips 11713646, Uhr 10.03.2025 09:18
>>
>> /* realla Prolog 2.64.6-2 */
>> ?- findall(X,between(1,1000,X),L), time((between(1,1000,_),
>>      maplist(succ,L,_),fail; true)), fail.
>> % Time elapsed 1.694s, 15004003 Inferences, 8.855 MLips
>>
>> Not surprisingly SWI-Prolog is fastest. What was
>> a little surprise is that Scryer Prolog can do it quite
>> fast, possibly since they heavily use maplist/n all
>>
>> over the place, they came up with things like '$fast_call'
>> etc.. in their call/n implementation. Trealla Prolog is
>> a little bit disappointing at the moment.
>>
>