-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Ignore non-conforming commit files in Delta log directory #3552
base: master
Are you sure you want to change the base?
Conversation
assert(logSegment.isPresent) | ||
checkLogSegment( | ||
logSegment.get(), | ||
expectedVersion = 14, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without the fix, this returns the latest snapshot version as 19.
public static boolean isCommitFile(String filePathStr) { | ||
Path filePath = new Path(filePathStr); | ||
String fileName = filePath.getName(); | ||
String fileParentName = filePath.getParent().getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Can we fetch the fileParentName only when it is needed? These Path/URI operations are slow and may cause slowness when applied on large number of objects.
if (DELTA_FILE_PATTERN.matcher(fileName).matches()) {
return true
} else {
String fileParentName = filePath.getParent().getName();
return COMMIT_SUBDIR.equals(fileParentName)
&& UUID_DELTA_FILE_REGEX.matcher(fileName).matches());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of tests are failing as not a commit file?
35c297d
to
7a18073
Compare
long version = -1; | ||
if (FileNames.isCommitFile(fileName)) { | ||
if (FileNames.isCommitFile(filePath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create #3586 to unify these utility APIs
Description
Support for coordinated commit introduced an issue where a Delta file in
_delta_log
that doesn't conform to the expected file name format (%020d.json
) is considered part of the Delta history.We use the following method to test whether a file is a
commit
file.Before coordinated commit support:
After the coordinated commit support:
For example, if the
_delta_log
has a file with name0000x.uuid.json
, it will be considered to be part of the delta file, but it is not. The commit files of format%20d.uuid.json
are considered part of the table history only if the parent directory name is_commit
.Update the above method as follows (this is in line with the Delta-Spark)
How was this patch tested?
Unit test that reproes the issue without the fix