-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 173 struct evolution #174
Conversation
...in-core/src/main/java/com/hotels/bdp/circustrain/core/replica/hive/RenameTableOperation.java
Outdated
Show resolved
Hide resolved
"struct<name:string, city:string>", | ||
structData, | ||
1); | ||
LOG.info(">>>> Table {} ", replicaCatalog.client().getTable(DATABASE, PARTITIONED_TABLE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was for debugging ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I see it in all the other tests too. So, may be it has some purpose. @patduin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah its in all the tests, think its just for logging purposes. It's kind of handy, especially when checking a build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just for logging, you can leave it if you find it handy, I'm ok either way
"struct<name:string, city:string, dob:string>", | ||
structData, | ||
2); | ||
LOG.info(">>>> Table {} ", sourceCatalog.client().getTable(DATABASE, PARTITIONED_TABLE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one too.
LOG | ||
.info("Found columns that have changed type, attempting to recreate target table with the new columns." | ||
+ "Old columns: {}, new columns: {}", oldColumns, newColumns); | ||
Table tempTable = new Table(newTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this copy all the table properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does indeed. We can either (1) scrub all the table params here, or (2) I've created a DropTableService
which removes custom table params before calling the drop table command. If thats overly complicated I can remove the new service and just do (1) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so if we removed the params here then we don't have to worry about the potential of Beekeeper picking up the drop of the temp table etc.? I think I prefer that approach unless some of the params are useful to have on the temp table? I don't think any of the intermediate operations use them in any way so probably not? And to be clear, the original table params will be kept on the final end result table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay cool, I'm happy to do this, I'll remove the DropTableService, and just remove the params as part of this service.
The intermediate ops dont use the params, but i think beekeeper may pick up on the alter partition events (though will ignore them as the location doesnt change). And yes, the original params will be on the final table - i'll add some tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this out, and actually it doesn't seem possible. We can't do an alter table on the table as it still has the old schema, so it fails with the same error. So the DropTableService looks better as we do the alter/drop once everything has been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's go with that for now and keep an eye on it when this is in use.
client.alter_table(to.getDbName(), toTableName, to); | ||
client.alter_table(from.getDbName(), fromTableName, from); | ||
} finally { | ||
client.dropTable(to.getDbName(), toTableNameTemp, false, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to double check beekeeper props and we don't accidentally delete folders when we drop this table.
...train-core/src/main/java/com/hotels/bdp/circustrain/core/replica/hive/AlterTableService.java
Show resolved
Hide resolved
@@ -49,19 +55,24 @@ public ReplicaFactory( | |||
Supplier<CloseableMetaStoreClient> replicaMetaStoreClientSupplier, | |||
HousekeepingListener housekeepingListener, | |||
ReplicaCatalogListener replicaCatalogListener, | |||
ReplicaTableFactoryProvider replicaTableFactoryPicker) { | |||
ReplicaTableFactoryProvider replicaTableFactoryPicker, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReplicaTableFactoryProvider replicaTableFactoryPicker, | |
ReplicaTableFactoryProvider replicaTableFactoryProvider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)
String toTableName = toTable.getTableName(); | ||
try { | ||
fromTable.setTableName(toTableName); | ||
toTable.setTableName(toTableName + TEMP_SUFFIX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be deleted?
} | ||
} | ||
|
||
private StorageDescriptor getSd(List<FieldSchema> columns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private StorageDescriptor getSd(List<FieldSchema> columns) { | |
private StorageDescriptor createStorageDescriptor(List<FieldSchema> columns) { |
} | ||
|
||
private List<Partition> createPartitions(int count) { | ||
ArrayList<Partition> partitions = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayList<Partition> partitions = new ArrayList<>(); | |
List<Partition> partitions = new ArrayList<>(); |
|
||
@Test | ||
public void removeParamsAndDrop() throws TException { | ||
HashMap<String, String> params = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashMap<String, String> params = new HashMap<>(); | |
Map<String, String> params = new HashMap<>(); |
|
||
@Test | ||
public void removeParamsAndDropCaseInsensitiveExternalTable() throws TException { | ||
HashMap<String, String> params = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashMap<String, String> params = new HashMap<>(); | |
Map<String, String> params = new HashMap<>(); |
<artifactId>avro</artifactId> | ||
<version>1.9.0</version> | ||
</dependency> | ||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move this down here? Ideally the dependencies should be alpha sorted by groupId and then artifactId in two sections - one for normal scope and another for test scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this was out of desperation initially to get the parquet files writing successfully. Will revert.
@@ -172,6 +165,28 @@ | |||
<artifactId>protobuf-java</artifactId> | |||
<scope>test</scope> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be wary of setting these to version numbers other than what the Hive/Hadoop dependencies are using. I guess this works but I just know how fragile things can get as the newer versions aren't backwards compatible and the only reason we're not seeing an issue now is we haven't hit certain code paths that would trigger problems.
preserve-raw-xattrs: false | ||
metrics-reporter: | ||
period: 1 | ||
time-unit: SECONDS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line break
.noDefault() | ||
.endRecord(); | ||
|
||
HashMap<String, String> structData = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashMap<String, String> structData = new HashMap<>(); | |
Map<String, String> structData = new HashMap<>(); |
try { | ||
writer.write(record); | ||
} catch (IOException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to throw this and fail the test if this happens?
LOG | ||
.info("Found columns that have changed type, attempting to recreate target table with the new columns." | ||
+ "Old columns: {}, new columns: {}", oldColumns, newColumns); | ||
Table tempTable = new Table(newTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's go with that for now and keep an eye on it when this is in use.
} | ||
|
||
@Test | ||
public void renameTableExceptionThrown1() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NABD but would be nice to replace "1" and "2" in this and the next method with something more descriptive as to what the difference is. From what I can see it's about where the exception is thrown, when altering the to or the from table? So maybe something like renameToTableException
and renameFromTableException
?
Addresses #173