Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 173 struct evolution #174

Merged
merged 22 commits into from
Mar 16, 2020
Merged

Issue 173 struct evolution #174

merged 22 commits into from
Mar 16, 2020

Conversation

max-jacobs
Copy link
Contributor

Addresses #173

@max-jacobs max-jacobs requested a review from a team March 10, 2020 16:35
"struct<name:string, city:string>",
structData,
1);
LOG.info(">>>> Table {} ", replicaCatalog.client().getTable(DATABASE, PARTITIONED_TABLE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was for debugging ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I see it in all the other tests too. So, may be it has some purpose. @patduin ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its in all the tests, think its just for logging purposes. It's kind of handy, especially when checking a build

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just for logging, you can leave it if you find it handy, I'm ok either way

"struct<name:string, city:string, dob:string>",
structData,
2);
LOG.info(">>>> Table {} ", sourceCatalog.client().getTable(DATABASE, PARTITIONED_TABLE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one too.

LOG
.info("Found columns that have changed type, attempting to recreate target table with the new columns."
+ "Old columns: {}, new columns: {}", oldColumns, newColumns);
Table tempTable = new Table(newTable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this copy all the table properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does indeed. We can either (1) scrub all the table params here, or (2) I've created a DropTableService which removes custom table params before calling the drop table command. If thats overly complicated I can remove the new service and just do (1) instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so if we removed the params here then we don't have to worry about the potential of Beekeeper picking up the drop of the temp table etc.? I think I prefer that approach unless some of the params are useful to have on the temp table? I don't think any of the intermediate operations use them in any way so probably not? And to be clear, the original table params will be kept on the final end result table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool, I'm happy to do this, I'll remove the DropTableService, and just remove the params as part of this service.

The intermediate ops dont use the params, but i think beekeeper may pick up on the alter partition events (though will ignore them as the location doesnt change). And yes, the original params will be on the final table - i'll add some tests for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this out, and actually it doesn't seem possible. We can't do an alter table on the table as it still has the old schema, so it fails with the same error. So the DropTableService looks better as we do the alter/drop once everything has been updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with that for now and keep an eye on it when this is in use.

client.alter_table(to.getDbName(), toTableName, to);
client.alter_table(from.getDbName(), fromTableName, from);
} finally {
client.dropTable(to.getDbName(), toTableNameTemp, false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to double check beekeeper props and we don't accidentally delete folders when we drop this table.

@@ -49,19 +55,24 @@ public ReplicaFactory(
Supplier<CloseableMetaStoreClient> replicaMetaStoreClientSupplier,
HousekeepingListener housekeepingListener,
ReplicaCatalogListener replicaCatalogListener,
ReplicaTableFactoryProvider replicaTableFactoryPicker) {
ReplicaTableFactoryProvider replicaTableFactoryPicker,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ReplicaTableFactoryProvider replicaTableFactoryPicker,
ReplicaTableFactoryProvider replicaTableFactoryProvider,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

String toTableName = toTable.getTableName();
try {
fromTable.setTableName(toTableName);
toTable.setTableName(toTableName + TEMP_SUFFIX);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be deleted?

}
}

private StorageDescriptor getSd(List<FieldSchema> columns) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private StorageDescriptor getSd(List<FieldSchema> columns) {
private StorageDescriptor createStorageDescriptor(List<FieldSchema> columns) {

}

private List<Partition> createPartitions(int count) {
ArrayList<Partition> partitions = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ArrayList<Partition> partitions = new ArrayList<>();
List<Partition> partitions = new ArrayList<>();


@Test
public void removeParamsAndDrop() throws TException {
HashMap<String, String> params = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HashMap<String, String> params = new HashMap<>();
Map<String, String> params = new HashMap<>();


@Test
public void removeParamsAndDropCaseInsensitiveExternalTable() throws TException {
HashMap<String, String> params = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HashMap<String, String> params = new HashMap<>();
Map<String, String> params = new HashMap<>();

<artifactId>avro</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move this down here? Ideally the dependencies should be alpha sorted by groupId and then artifactId in two sections - one for normal scope and another for test scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this was out of desperation initially to get the parquet files writing successfully. Will revert.

@@ -172,6 +165,28 @@
<artifactId>protobuf-java</artifactId>
<scope>test</scope>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be wary of setting these to version numbers other than what the Hive/Hadoop dependencies are using. I guess this works but I just know how fragile things can get as the newer versions aren't backwards compatible and the only reason we're not seeing an issue now is we haven't hit certain code paths that would trigger problems.

preserve-raw-xattrs: false
metrics-reporter:
period: 1
time-unit: SECONDS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line break

.noDefault()
.endRecord();

HashMap<String, String> structData = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HashMap<String, String> structData = new HashMap<>();
Map<String, String> structData = new HashMap<>();

try {
writer.write(record);
} catch (IOException e) {
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to throw this and fail the test if this happens?

Max Jacobs added 2 commits March 13, 2020 11:12
LOG
.info("Found columns that have changed type, attempting to recreate target table with the new columns."
+ "Old columns: {}, new columns: {}", oldColumns, newColumns);
Table tempTable = new Table(newTable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with that for now and keep an eye on it when this is in use.

}

@Test
public void renameTableExceptionThrown1() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NABD but would be nice to replace "1" and "2" in this and the next method with something more descriptive as to what the difference is. From what I can see it's about where the exception is thrown, when altering the to or the from table? So maybe something like renameToTableException and renameFromTableException?

@max-jacobs max-jacobs merged commit f6f5389 into master Mar 16, 2020
@max-jacobs max-jacobs deleted the issue-173-struct-evolution branch March 16, 2020 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants