Project

General

Profile

Bug #10185

Remote-run exec for root fail with "rudder agent was interrupted"

Added by François ARMAND 9 months ago. Updated 9 months ago.

Status:
Released
Priority:
N/A
Category:
Relay server or API
Target version:
Target version (plugin):
Severity:
User visibility:
Effort required:
Priority:

Description

The message is:

== [fanf@luhman16] 
% curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx"  -X POST 'https://192.168.44.2/rudder/api/latest/nodes/root/applyPolicy' -d "classes=inventory" 
error    Rudder agent was interrupted during execution by a fatal error
         Run with -i to see log messages.

## Summary #####################################################################
0 components verified in 0 directives
execution time: 0.31s
################################################################################

== [fanf@luhman16] 
% curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx"  -X POST 'https://192.168.44.2/rudder/api/latest/nodes/6866e5db-bb41-4110-958b-c1f1c90dbcbe/applyPolicy'
error    Rudder agent was interrupted during execution by a fatal error
         Run with -i to see log messages.

## Summary #####################################################################
0 components verified in 0 directives
execution time: 0.29s
################################################################################

OK, after trying to start it some more time on the remote node, it started to work. I have 0 idea about what was wrong. And there is no log.

We need to at least add logs to be able to do some forensic when things are not working as expected.


Subtasks

Bug #10300: NumberFormatException on remote-api call for rootRejectedFrançois ARMAND


Related issues

Related to Rudder - User story #10314: Document remote-run exec compatibility New

Associated revisions

Revision b6ca5a9f
Added by Alexis MOUSSET 9 months ago

Fixes #10185: Remote-run exec for root and nodes behind relays fail with \"rudder agent was interrupted\"

History

#1 Updated by François ARMAND 9 months ago

Editing on the root server file: /opt/rudder/share/relay-api/relay_api/remote_run.py to add a "-i" to REMOTE_RUN_COMMAND && restarting apache, I'm now getting:

rudder     info: ........................................................................
rudder     info: Hailing server.rudder.local : 5309
rudder     info: ........................................................................
   error: TRUST FAILED, server presented untrusted key: MD5=3275d8e38205fada95e6236901099527
   error: Failed to connect to host: server.rudder.local
error    Rudder agent was interrupted during execution by a fatal error

## Summary #####################################################################
0 components verified in 0 directives
execution time: 0.30s
################################################################################

We should have an API option to allow to use that output.

#2 Updated by François ARMAND 9 months ago

And sometimes, I don't get anything at all:

== [fanf@luhman16] ==
% curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx"  -X POST 'https://192.168.44.2/rudder/api/latest/nodes/c867b070-0721-43d3-8825-d78c51c2c632/applyPolicy'

#3 Updated by François ARMAND 9 months ago

  • Assignee set to Benoît PECCATTE

#4 Updated by François ARMAND 9 months ago

  • Assignee changed from Benoît PECCATTE to Nicolas CHARLES

#5 Updated by François ARMAND 9 months ago

  • Tags set to Blocking 4.1

#6 Updated by Nicolas CHARLES 9 months ago

Webapp log show following error:

java.lang.NumberFormatException: For input string: "Error when trying to contact internal remote-run API: null" 
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Byte.parseByte(Byte.java:149)
        at java.lang.Byte.parseByte(Byte.java:175)
        at scala.collection.immutable.StringLike.toByte(StringLike.scala:297)
        at scala.collection.immutable.StringLike.toByte$(StringLike.scala:297)
        at scala.collection.immutable.StringOps.toByte(StringOps.scala:29)
        at com.normation.rudder.web.rest.node.NodeApiService8.runResponse(NodeAPIService8.scala:119)
        at com.normation.rudder.web.rest.node.NodeApiService8.$anonfun$runNode$4(NodeAPIService8.scala:155)
        at com.normation.rudder.web.rest.node.NodeApiService8.$anonfun$runNode$4$adapted(NodeAPIService8.scala:155)
        at net.liftweb.http.LiftServlet.sendResponse(LiftServlet.scala:1040)
        at net.liftweb.http.LiftServlet.doService(LiftServlet.scala:451)
        at net.liftweb.http.LiftServlet.$anonfun$service$2(LiftServlet.scala:157)
        at net.liftweb.util.TimeHelpers.calcTime(TimeHelpers.scala:427)
...
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:369)
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:464)
        at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:913)
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:975)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:641)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:231)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)

#7 Updated by Nicolas CHARLES 9 months ago

Ha, this message is probably more relevant:

Feb 28 12:02:32 server cf-serverd[1583]: CFEngine(server) rudder 127.0.0.1> Connection was hung up while receiving line: 
Feb 28 12:02:32 server cf-serverd[1583]: CFEngine(server) rudder 127.0.0.1> Client closed connection early! He probably does not trust our key...

#8 Updated by Nicolas CHARLES 9 months ago

Full verbose is

rudder  verbose: Obtained IP address of '127.0.0.1' on socket 7 from accept
rudder  verbose: New connection (from 127.0.0.1, sd 7), spawning new thread...
rudder     info: 127.0.0.1> Accepting connection
rudder  verbose: 127.0.0.1> Setting socket timeout to 600 seconds.
rudder  verbose: 127.0.0.1> Peeked nothing important in TCP stream, considering the protocol as TLS
rudder  verbose: 127.0.0.1> TLS version negotiated:  TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
rudder  verbose: 127.0.0.1> TLS session established, checking trust...
rudder  verbose: 127.0.0.1> Remote peer terminated TLS session (SSL_read)
   error: 127.0.0.1> Connection was hung up while receiving line: 
  notice: 127.0.0.1> Client closed connection early! He probably does not trust our key..

#9 Updated by Nicolas CHARLES 9 months ago

Agent side:

rudder  verbose: Connected to host 192.168.41.2 address 192.168.41.2 port 5309 (socket descriptor 4)
rudder  verbose: TLS version negotiated:  TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
rudder  verbose: TLS session established, checking trust...
rudder  verbose: Did not find new key format '/var/rudder/cfengine-community/ppkeys/root-MD5=57ccba22df018012132877618ff655f9.pub'
rudder  verbose: Trying old style '/var/rudder/cfengine-community/ppkeys/root-192.168.41.2.pub'
rudder  verbose: Received key 'MD5=57ccba22df018012132877618ff655f9' not found in ppkeys
   error: TRUST FAILED, server presented untrusted key: MD5=57ccba22df018012132877618ff655f9
rudder  verbose: Connection to 192.168.41.2 is closed
   error: Failed to connect to host: 192.168.41.2

#10 Updated by Nicolas CHARLES 9 months ago

Ok, so after some tests:
  1. we cannot remote run on itself ; cf-runagent doesn't seem to support it
  2. remote running a 4.1 node is ok
  3. remote running a 4.0 node fails, as command is not valid
    root@agent1:/home/vagrant# /opt/rudder/bin/rudder agent run -uR -I -Dcfruncommand
    Rudder agent 4.0.4.rc1.git201702280322 (CFEngine Core 3.7.4)
    Node uuid: e04cdc24-2180-4d2e-b334-0445a13a3a45
    ok: Rudder agent promises were updated.
       error: Remote execution cannot ignore locks
    
  4. remote running a 3.1 node fails, as command /opt/rudder/bin/rudder agent run -uR -I -Dcfruncommand --inform is not valid
    root@agent2:/home/vagrant# /opt/rudder/bin/rudder agent run -uR -Dcfruncommand --inform
    /opt/rudder/share/commands/agent-run : option non permise -- u
    /opt/rudder/share/commands/agent-run : option non permise -- -
    /opt/rudder/share/commands/agent-run : option non permise -- n
    /opt/rudder/share/commands/agent-run : option non permise -- o
    Rudder agent 3.1.19.rc1.git201702210714 (CFEngine Core 3.6.5)
    Node uuid: 791a6ebe-cfb1-4f54-b9a2-48ca162f64b6
    2017-02-28T12:38:05+0000    error: Remote execution cannot ignore locks
    

So, a remote API should not try to remote run on local system, and we need a fix for 4.0 and 3.1 compatibility

#11 Updated by François ARMAND 9 months ago

  • Tags deleted (Blocking 4.1)
  • Category set to Relay server or API
  • Assignee changed from Nicolas CHARLES to Benoît PECCATTE

#12 Updated by François ARMAND 9 months ago

I'm letting that ticket open to change Relay API and do the correct call to rudder agent. I'm opening a subticket to correct the null pointer exception on rudder side that should not happen.

#13 Updated by Alexis MOUSSET 9 months ago

  • Status changed from New to In progress
  • Assignee changed from Benoît PECCATTE to Alexis MOUSSET

#14 Updated by Alexis MOUSSET 9 months ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis MOUSSET to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/1270

#15 Updated by Alexis MOUSSET 9 months ago

  • Status changed from Pending technical review to Pending release

#16 Updated by Nicolas CHARLES 9 months ago

#17 Updated by François ARMAND 9 months ago

  • Subject changed from Remote-run exec for root and nodes behind relays fail with "rudder agent was interrupted" to Remote-run exec for root fail with "rudder agent was interrupted"

#18 Updated by Vincent MEMBRÉ 9 months ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 4.1.0~rc1 which was released today.

Also available in: Atom PDF