Skip to content

[ZEPPELIN-6204] do not allow init params in JDBC URLs for H2 #4949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

pjfanning
Copy link
Contributor

What is this PR for?

ZEPPELIN-6204

Slight tidy of the existing disallow list for strings in JDBC urls so that they are checked against just the query params and not the hostname in the URL.

What type of PR is it?

Bug Fix

Todos

  • - Task

What is the Jira issue?

How should this be tested?

  • Strongly recommended: add automated unit tests for any new or changed behavior
  • Outline any manual steps to test the PR here.

Screenshots (if appropriate)

Questions:

  • Does the license files need to update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? no

@Reamer
Copy link
Contributor

Reamer commented Jun 26, 2025

I don't know whether we are preventing features that others want to use with these changes.

Doesn't it make more sense to secure the H2 server according to the requirements? After all, Zeppelin is just one JDBC client of many.

Reamer
Reamer previously approved these changes Jul 1, 2025
@pjfanning pjfanning marked this pull request as draft July 2, 2025 11:34
@pjfanning pjfanning marked this pull request as ready for review July 8, 2025 10:06
@pjfanning
Copy link
Contributor Author

@Reamer @jongyoul is it ok to consider this for merge now? There are probably a few more query params that could be added to the disallow list but this covers some of the more commonly mentioned ones.

@tbonelee
Copy link
Contributor

Just a small thought. I was wondering if decoding the full URL before parsing could lead to unexpected behavior in some edge cases. For example, if a parameter value contains encoded characters like %26 (&) or %3D (=), decoding them too early might interfere with how the parameters are parsed.

I’m not sure if values like passwords are actually allowed or commonly used in this context, so this might not be relevant. But I thought it was worth mentioning just in case. 🙏

Comment on lines +603 to +611
if (containsKeyIgnoreCase(params, ALLOW_LOAD_LOCAL) ||
containsKeyIgnoreCase(params, ALLOW_LOAD_LOCAL_IN_FILE_NAME) ||
containsKeyIgnoreCase(params, ALLOW_LOCAL_IN_FILE_NAME) ||
containsKeyIgnoreCase(params, ALLOW_URL_IN_LOCAL_IN_FILE_NAME) ||
containsKeyIgnoreCase(params, ALLOW_LOAD_LOCAL_IN_FILE_IN_PATH) ||
containsKeyIgnoreCase(params, AUTO_DESERIALIZE) ||
containsKeyIgnoreCase(params, SOCKET_FACTORY)) {
throw new IllegalArgumentException("Connection URL contains sensitive configuration");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (containsKeyIgnoreCase(params, ALLOW_LOAD_LOCAL) ||
containsKeyIgnoreCase(params, ALLOW_LOAD_LOCAL_IN_FILE_NAME) ||
containsKeyIgnoreCase(params, ALLOW_LOCAL_IN_FILE_NAME) ||
containsKeyIgnoreCase(params, ALLOW_URL_IN_LOCAL_IN_FILE_NAME) ||
containsKeyIgnoreCase(params, ALLOW_LOAD_LOCAL_IN_FILE_IN_PATH) ||
containsKeyIgnoreCase(params, AUTO_DESERIALIZE) ||
containsKeyIgnoreCase(params, SOCKET_FACTORY)) {
throw new IllegalArgumentException("Connection URL contains sensitive configuration");
}
Stream.of(ALLOW_LOAD_LOCAL, ALLOW_LOCAL_IN_FILE_NAME, ...).forEach(rule -> {
if (containsKeyIgnoreCase(params, rule)) throw new IllegalArgumentException("Connection URL contains sensitive configuration: " + rule);
});

I find this version easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't commit this unless it compiles so please don't use the suggestion option to suggest incomplete code.
I also disagree that this is more readable. The existing code in this PR is closer to the existing style in the existing codebase.

Comment on lines +159 to +174
private static final String ALLOW_LOAD_LOCAL = "allowLoadLocal";

private static final String ALLOW_LOAD_LOCAL_IN_FILE_NAME = "allowLoadLocalInfile";

private static final String AUTO_DESERIALIZE = "autoDeserialize";
private static final String ALLOW_LOAD_LOCAL_IN_FILE_IN_PATH = "allowLoadLocalInfileInPath";

private static final String ALLOW_LOCAL_IN_FILE_NAME = "allowLocalInfile";

private static final String ALLOW_URL_IN_LOCAL_IN_FILE_NAME = "allowUrlInLocalInfile";

private static final String AUTO_DESERIALIZE = "autoDeserialize";

private static final String SOCKET_FACTORY = "socketFactory";

private static final String INIT = "INIT";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be nice to group these into a Set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to stick with the existing code style.

}

// Split the URL into the base part and the parameters part
String[] parts = url.split("[?&;]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a bit faster if the regex was cached to avoid rebuilding every time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cost of this regex will be tiny compared to the I/O cost of using JDBC commands so I'm not very interested in optimising every line of code here

for (String pair : keyValuePairs) {
splitNameValue(pair, parameters);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work well if parentheses are in the URL query parameters?
jdbc:mysql://.../db?password=abc(def)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code doesn't care about what the values are, it only cares about the param names - write some test cases if you are suspicious

Copy link
Contributor

@seung-00 seung-00 Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the following cases:

testBannedMySQLQueryParamWithValidParam("password=allowLoadLocalInfile"); // 1. fail
testBannedMySQLQueryParamWithValidParam("password=(allowLoadLocalInfile)"); // 2. pass

I might be missing something, but test case 2 seems unintended.
Part of the value allowLoadLocalInfile appears to be treated as a key due to the parentheses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - I'll look into the issue - but in the end of the day, it's not a very likely password
and the existing checks before this PR, they just do a string match on the whole URL so that code wouldn't handle this either

@pjfanning
Copy link
Contributor Author

Just a small thought. I was wondering if decoding the full URL before parsing could lead to unexpected behavior in some edge cases. For example, if a parameter value contains encoded characters like %26 (&) or %3D (=), decoding them too early might interfere with how the parameters are parsed.

I’m not sure if values like passwords are actually allowed or commonly used in this context, so this might not be relevant. But I thought it was worth mentioning just in case. 🙏

The code in this PR ignores the param values. It only worries about the param names.
The original URL as entered by the user is passed to the JDBC layer - my code splits up the URL but only for my checks for suspicious param names.

@tbonelee
Copy link
Contributor

@pjfanning Thanks for the clarification. I took a closer look and you're right. Decoding encoded special characters in values before validation would at most introduce an extra key, but it does not interfere with the parsing of subsequent parameters. I had initially worried about false negatives (i.e. cases where disallowed parameters might go undetected), but that doesn't seem to be the case. Thanks again, and sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants